Skip to content

New Input Generator Framework in Avogadro 2

Avogadro 1.x had quite a large number of input generators that came from very humble beginnings. They were designed to be easy to write, and to give a simple path from a structure in Avogadro to something that could be used as an input file in one of many codes. Our basic approach was to add a C++ class per program we targeted, with one or two special cases. This meant that to develop an input generator it was necessary to learn some of the Avogadro API, and to at least compile a plugin (matching our compiler, Qt, library versions, etc). It also led to minor differences between the different input generators, and a lot of copying/pasting of boilerplate code.

Avogadro 2 showing an ethane molecule

When developing the input generators for Avogadro 2 as part of the Open Chemistry project we wanted to make it easier to add new generators. We put a lot of thought into how to make this possible, and how to maintain a native look and feel without necessarily making an input generator developer learn C++, Qt, Avogadro and everything that goes along with setting up a development environment. The new input generator framework is largely language agnostic, with a minimum of assumptions. It currently executes the Python interpreter, but that is largely an artifact of the fact we have only developed input generators using Python.

Avogadro 2 NWChem input generator with syntax highlighting

The input generators are executed in a separate process, using several passes to get the display name, options supported, syntax highlighting rules and finally to actually generate the input. The current pass is communicated using command-line arguments, input is passed to the program using standard input and formatted as JSON. The results should be passed back using the standard output stream, and depending on the pass should be JSON results or the actual input file. We also do some post-processing of the input file where the molecular geometry can be inserted following the specified format. This command line API is documented here. The NWChem input generator is the first to add syntax highlighting in an external plugin, the GAMESS input generator shows an approach using C++ ported from Avogadro 1.x.

This approach assures that an input generator cannot possibly crash or hang the interface, licensing is not an issue (separate execution process) and gives input generator developers the freedom to concentrate on turning options into the appropriate input file without worrying about the details of the application it is being used in. With relatively minor modifications Avogadro 2 could look for other file extensions and execute the appropriate interpreter, or simply execute the programs found in a given path. These files can be modified directly, if options change it is currently necessary to restart Avogadro, but if the input generation changes those changes would be reflected in Avogadro the next time the generator was run. Menu entries are added dynamically at program start up, and this concept could be extended to more of Avogadro. The main for the NWChem input generator is shown below,

if <u>_name_</u> == "<u>_main_</u>":
  parser = argparse.ArgumentParser('Generate a NWChem input file.')
  parser.add_argument('--debug', action='store_true')
  parser.add_argument('--print-options', action='store_true')
  parser.add_argument('--generate-input', action='store_true')
  parser.add_argument('--display-name', action='store_true')
  args = vars(parser.parse_args())

  debug = args['debug']

  if args['display_name']:
    print("NWChem")
  if args['print_options']:
    print(json.dumps(getOptions()))
  elif args['generate_input']:
    print(json.dumps(generateInput()))

A snippet of the input generation code is shown below, where a variable is populated with what will be the raw input passed to the code.

def generateInputFile(opts):
  # Extract options:
  title = opts['Title']
  calculate = opts['Calculation Type']
  theory = opts['Theory']
  basis = opts['Basis']
  multiplicity = opts['Multiplicity']
  charge = opts['Charge']
  # Preamble
  nwfile = ""
  nwfile += "echo\n\n"
  nwfile += "start molecule\n\n"
  nwfile += "title \"%s\"\n"%title
  # Coordinates
  nwfile += "geometry units angstroms print xyz autosym\n"
  nwfile += "$$coords:Sxyz$$\n"
  nwfile += "end\n\n"
  # More stuff here...
  return nwfile

We hope that this framework will make it much easier for researchers to customize their input generator scripts to their needs, and we would welcome your feedback on how we could make it even easier. If there are other languages of interest we could add examples, the major requirement is that the language can create a self-contained script or executable that can use standard in/out, has some string handling capabilities and support for JSON.

Using VTK's Image Regression Tests in Avogadro 2

One of the really nice features of VTK's testing framework is the use of image-based regression tests. These allow developers to write tests that result in a final image, which can be recorded and compared to known baseline images in order to verify that the OpenGL rendering code is rendering the same (or similar) image on all platforms. If this fails then CDash will display the image the test produced, the baseline image it was compared to and an image difference. Any project that performs rendering or visualization needs tests like these in addition to unit tests if they want to assure visualization code continues to function as expected across a range of platforms.

We recently extracted the relevant code from the VTK testing framework to perform image based regressions in Avogadro 2, with the bulk of that code living in utilities/vtktesting/imageregressiontest.h. This is currently used in one of the tests, with plans to extend it to cover all major types of rendering, this can in seen in action intests/qtopengl/glwidgettest.cpp with the important lines that take the snapshot/do the image comparison being:

  // Grab the frame buffer of the GLWidget and save it to a QImage.
  QImage image = widget.grabFrameBuffer(false);
  // Set up the image regression test.
  Avogadro::VtkTesting::ImageRegressionTest test(argc, argv);
  // Do the image threshold test, printing output to the std::cout.
  return test.imageThresholdTest(image, std::cout);

The CMake code that feeds in the command line arguments, and ensures the test runs correctly is in tests/qtopengl/CMakeLists.txt, and largely involves passing in paths to the baseline directory, a temporary directory and the test name (using the standard CMake generated test driver).

  add_test(NAME "QtOpenGL-${test}"
    COMMAND
      AvogadroQtOpenGLTests "${testname}test"
      "--baseline" "${AVOGADRO_DATA_ROOT}/baselines/avogadro/qtopengl"
      "--temporary" "${PROJECT_BINARY_DIR}/Testing/Temporary")
Valid baseline image

The above is the baseline image, that is stored in a known location and compared with the image produced by the test (shown below).

Test image image produced

If the images don't match a difference image is produced and uploaded (shown below). In this case you can see that an extra sphere was rendered, and this can clearly be seen in the difference image. There is also a numerical difference returned by the test, which is a measure of how much the images differ. The tolerance can be tweaked depending on the test to allow some minor pixel differences, although care must be taken not to raise the number too high.

Image difference from test to valid

We have not implemented it in Avogadro 2 yet, but VTK can use multiple baselines and returns the smallest image difference. This allows for OS/GPU specific baselines to be uploaded where necessary as an alternative to increasing the tolerance. Using special tags returned by the tests in the standard output will prompt the ctest command to upload the image files when necessary (in the case the baseline image cannot be found, or the image comparison fails).

First Open Chemistry Beta Release

Open Chemistry

We are pleased to announce the first beta release of the Open Chemistry suite of cross platform, open-source, BSD-licensed tools and libraries - Avogadro 2, MoleQueue and MongoChem. They are being released in beta, before all planned features are complete, to get feedback from the community following the open-source mantra of “release early, release often”. We will be making regular releases over the coming months, as well as automatically generating nightly binaries. A Source article from 2011 introduced the project, slides from FOSDEM describe it more recently, and the 0.5.0 release binaries can be downloaded here.

Open Chemistry workflow

These three desktop applications can each be used independently, but also have the capability of working together. Avogadro 2 is a rewrite of Avogadro that addresses many of the limitations we saw. This includes things such as the rendering code, scalability, scriptability, and increased flexibility, enabling us to effectively address the current and upcoming challenges in computational chemistry and related fields. MoleQueue provides desktop services for executing standalone programs both locally and on remote batch schedulers, such as Sun Grid Engine, PBS and SLURM. MongoChem provides chemically-aware search, storage, and informatics visualization using MongoDB and VTK.

Open Chemistry library organization

Avogadro 2

Avogadro 2 is a rewrite of Avogadro; please see the recently-published paper for more details on Avogadro 1. Avogadro has been very successful over the years, and we would like to thank all of our contributors and supporters, including the core development team: Geoff Hutchison, Donald Curtis, David Lonie, Tim Vandermeersch, Benoit Jacob, Carsten Niehaus, and Marcus Hanwell. We also recently obtained permission from almost all authors to relicense the existing code under the 3-clause BSD license, which will make migration of code to the new architecture much easier.

Avogadro 2 rendering a molecular orbital

Some notable new features of Avogadro 2 include:

  • Scalable data structures capable of addressing the needs of large molecular systems.
  • A flexible file I/O API supporting seamless addition of formats at runtime.
  • A Python-based input generator API, creating an input for a range of quantum codes.
  • A specialized scene graph for supporting scalable molecular rendering.
  • OpenGL 2.1/GLSL based rendering, employing point sprites, VBOs, etc.
  • Unit tests for core classes, with ongoing work to improve coverage.
  • Binary installers generated nightly.
  • Use of MoleQueue to run computational codes such as NWChem, MOPAC, GAMESS, etc.

Avogadro is not yet feature complete, but we invite you to try it out along with the suite of applications as we continue to improve it. The new Avogadro libraries feature much finer granularity; whereas before we provided a single library with all API, there is now a layered API in multiple libraries. The Core and IO libraries have minimal dependencies, with the rendering library adding a dependence on OpenGL, and the Qt libraries adding Qt 4 dependencies. This allows us to reuse the code in many more places than was possible before, with rendering possible on a server without Qt/X, and the Core/IO libraries being suitable for command line use or integration into non-graphical applications.

MoleQueue

MoleQueue is a new application developed to satisfy the need to execute computational chemistry codes locally and remotely. Rather than adding this functionality directory to Avogadro 2, it has been developed as a standalone system-tray resident application that runs a graphical application and a local server (using local sockets for communication). It supports the configuration of multiple queues (local and remote), each containing one-or-more programs to be executed. Applications communicate with MoleQueue using JSON-RPC 2.0 over a local socket, and receive updates as the job state changes. A recent Source article describes MoleQueue in more detail.

MoleQueue queue configuration

In addition to the system-tray resident application, MoleQueue provides a Qt 4-based client library that can easily be integrated into Qt applications, providing a familiar signal-slot based API for job submission, monitoring, and retrieval. The project has remained general in its approach, containing no chemistry specific API, and has already been used by several other projects at Kitware in different application domains. Communicating with the MoleQueue server from other languages is quite simple, with the client code having minimal requirements for connecting to a named local socket and constructing JSON strings conforming to the JSON-RPC 2.0 specification.

MongoChem

MongoChem is another new application developed as part of the Open Chemistry suite of tools, leveraging MongoDB, VTK, and AvogadroLibs to provide chemical informatics on the desktop. It seeks to address the need for researchers and groups to be able to effectively store, index, search and retrieve relevant chemical data. It supports the use of a central database server where all data can be housed, and enables the significant feature set of MongoDB to be leveraged, such as sharding, replication and efficient storage of large data files. We have been able to reuse several powerful cheminformatics libraries such as Open Babel and Chemkit to generate identifiers, molecular fingerprints and other artifacts as well as developing out features in the Avogadro libraries to support approaches to large datasets involving many files.

MongoChem

We have taken advantage of the charts developed in VTK and 2D chemical structure depiction in Open Babel to deliver immersive charts that are capable of displaying multiple dimensions of the data. Linked selection allows for selection in one view, such as parallel coordinate; views of that selection in a scatter plot matrix, and the table view. The detail dialog for a given molecule shows 2D structure depiction, an interactive 3D visualization when geometry is available and support for tagging and/or annotation. We have also developed an early preview of a web interface to the same data using ParaViewWeb, enabling you to share data more widely if desired. This also features a 3D interactive view using the ParaViewWeb image streaming technology which works in almost all modern browsers.

Putting Them Together

Each of the applications in the Open Chemistry suite listens for connections on a named local socket, and provides a simple JSON-RPC 2.0 based API. Avogadro 2 is capable of generating input files for several computational chemistry codes, including GAMESS and NWChem, and can use MoleQueue to execute these programs and keep track of the job states. Avogadro 2 can also query MongoChem for similar molecules to the one currently displayed, and see a listing sorted by similarity. MongoChem is capable of searching large collections of molecules, and can use the RPC API to open any selected molecule in the active Avogadro 2 session.

Acknowledgements

The development of the Open Chemistry workbench has been funded by a US Army SBIR with the Engineering Research Development Center under contract (W912HZ-12-C-0005) at Kitware, Inc.

Originally published on the Kitware blog

Tour of Druthers During Saratoga Beer Week

Last week, during Saratoga Beer Week, a small group of intrepid colleagues ventured up to Saratoga Springs to revisit Druthers, a local brew pub that opened last year. On Thursday they were offering tours of their brewery, for just $10 you got to sample a pint of their fine brew, keep the special souvenir glass and learn about their brewing process and philosophy.

Shot of the brewery

We had a great time, and maybe even learned a little. It is clear that the selection of bars in Saratoga Springs is growing, with some amazing brew pubs and tap rooms springing up. The list of events and venues during beer week made this quite clear. Glens Falls is still our longest trip as a group in search of great local beers, although one day it would be great to arrange a trip out to Ommegang to see an American take on Belgium beer. I have already been once, and really like their beers.

FOSDEM: Open Science and Open Chemistry

I will be talking about the Open Chemistry Project at FOSDEM this year in the FOSS for scientists devroom at 12:30pm on Saturday. I will discuss the development of a suite of tools for computational chemists and related disciplines, which includes the development of three desktop applications addressing 3D molecular structure editing, input preparation, output analysis, cheminformatics and integration with high-performance computing resources.

Open Chemistry

On Sunday Bill Hoffman will be speaking in the main track about Open Science, Open Software, and Reproducible Code at 3pm on Sunday. Bill and Alexander Neundorf will also be talking about Modern CMake in the cross desktop devroom on Saturday.

FOSDEM is one of the first conferences I attended (possibly the first, I can't remember if I went to a science conference before this). It will be great to return after so many years, and hopefully meet old colleagues and a few new ones. Please find me, Bill or Alex if you would like to discuss any of this work with us. I fly out tomorrow, and hope to get over jet lag quickly. Once FOSDEM is over we will be visiting Kitware SAS in Lyon, France for a couple of days (this is my first trip to our new office).

Then I have a few days in England visiting friends and family before heading back to the US.

Avogadro Paper Published Open Access

In January of last year I was invited to attend the Semantic Physical Science Workshop in Cambridge, England. That was a great meeting where I met like-minded scientists and developers working on adding semantic structure to data in the physical sciences. Peter managed to bring together a varied group with many backgrounds, and so the discussions were especially useful. I was there to think about how our work with Avogadro, and the wider Open Chemistry project might benefit from and contribute to this area.

Avogadro graphical abstract

My thanks go out to Peter Murray-Rust for inviting me to the Semantic Physical Science meeting and helping us to get the Avogadro paper published in the Journal of Cheminformatics as part of the Semantic Physical Science collection. Noel O'Boyle wrote up a blog post summarizing the Avogadro paper accesses in the first month (shown below - thanks Noel) compared to the Blue Obelisk paper and the Open Babel paper. We only just got the final version of the PDF/HTML published in early January, but already have 12 citations according to Google scholar, showing as the second most viewed article in the last 30 days, and the most viewed article in the last year. The paper made the Chemistry Central most accessed articles list in October and November.

I made a guest blog post talking about open access and the Avogadro paper, which was later republished for a different audience. I would like to thank Geoffrey Hutchison, Donald Curtis, David Lonie, Tim Vandermeersch and Eva Zurek for the work they put into the article, along with our contributors, collaborators and the users of Avogadro. If you use Avogadro in your work please cite our paper, and get in touch to let us know what you are doing with it. As we develop the next generation of Avogadro we would appreciate your input, feedback and suggestions on how we can make it more useful to the wider community.

The Roller Coaster of 2012

It has been a long time since I wrote anything on here, I am still alive and kicking! 2012 was another roller coaster of a year, with many good and bad things happening. Louise and I got our green cards early on in the year (massive thanks to my employer), which was great after having lived in the US for over five years now. We started house hunting a few months after that, which was an adventure and a half.

As we were in the process of looking for a house I was promoted to technical leader at Kitware, and I continue to work on our Open Chemistry project. We ended up falling in love with the first house we found, and found a great realtor who took us back there for a second look. We then learned how different buying a house in the US versus England, but after several rounds of negotiations came to an agreement. We had a very long wait for completion, but that all proceeded well in the end.

As we moved out of the place we had been renting for the last three years we found out just how bad some landlords can be about returning security deposits...that is still ongoing and has not been a fun process. We never rented in England, but many friends have assured us that this isn't that unusual. Our move actually went very smoothly though, and we have some great friends who helped us with some of the heavy lifting. We have been learning what it is like to own a home in the country, with a well, septic, large garden etc. The learning curve has been a little steep at times! We attended two weddings (I was a groomsman in one) with two amazing groups of friends - it was a pleasure to be part of the day for two great friends.

I made a few guest blog posts, which I will try to talk more about in another post, and attended some great conferences including the ACS, Semantic Physical Science and Supercomputing. Our Avogadro paper was published, and was recently published in final form (I will write more about this too). I finally cancelled my dedicated server (an old Gentoo box), which I originally took when I was consulting in England, this was very disruptive in the end and I didn't have a complete backup of all data when it was taken offline. This caused lots of disruption to email (sorry if I never got back to you). I moved to a cloud server with Rackspace in the end, after playing with a few alternatives. I was retired as a Gentoo developer too (totally missed those emails), it was a great experience being a developer and I still value many of the friendships formed during that time. My passion for packaging has wained in recent years, and I tend to use Arch Linux more now (although still love lots of things about Gentoo).

Just before Xmas our ten year old German Shepherd developed a sudden paralysis in his back legs and had to be put down. It was pretty devastating, after having him from when he was 12 weeks old. He joined our little family just after we got our own place in England, he had five great years in England and another five in the US. He was with me for so much of my life (a degree, loss of my brother, marriage, loss of my sister, moving to another country, birth of our first child, getting a "real" job). We had family over for the holidays as we call them over here (Xmas and New Year back home), which was great but we may not have been the best of company after having just lost our dog.

I think I skipped lots of stuff too, but it was quite a year! Hoping for more of a steady ride this year to say the least.

Open Chemistry, VTK and ParaViewWeb

Last year David Lonie, now a new Kitware employee, worked on a Google Summer of Code project to add better support for chemical structure visualization to VTK. More recently, Kyle Lutz added representations to ParaView to expose some of this new functionality for ParaView users. Once that was in place we were able to work with Sébastien Jourdain to expose this functionality in ParaViewWeb and expose parts of the MongoDB database we have been working on as part of the Open Chemistry project. You can checkout the live demo here, or take a look at the screen shot below.

ParaVIewWeb and Open Chemistry live demo

It was up and running within a day, and in another day we had a query page and summaries exposed in ParaViewWeb with some simple queries. ChemData exposes more complex searches and 2D visualizations of the data contained. The 2D images are created using Open Babel's SVG rendering, and saved to the database as PNGs for speed and the 3D structure is rendered using ParaViewWeb and image based delivery right now. You can interact with the 3D geometry both inline, or full screen. We will be extending this to show electronic structure and adding other features in the near future too.

Open Science, Open Access and Open Source

I have been thinking this over for quite a while, and have written this post several times over in my mind. As an undergraduate student I remember admiring scientists and imagining how amazing it must be to have a job where you got to discover new things, think of better solutions to problems facing our society and making the world a better place. As my studies continued I aspired to become one of those researchers, and made the decision to take my studies further and applied to do a PhD.

As a PhD student I enjoyed learning more about materials, and was excited to be working with gold nanoparticles and research into how we might make real devices out of this novel new material in the Nanomaterial Engineering Group. It was exciting, challenging and fascinating using techniques such as X-ray and neutron reflectometry, electron and atomic force microscopy and Langmuir-Blodgett troughs. As I learned more through my work I became frustrated with the quality of the software I used, and had always imagined that "real scientists" had better tools available to them. It became even more frustrating when I realized how bad some of the instrument control software was, and how so many of the file formats could only be used in one or two expensive and hard to use programs that only worked on one or two platforms.

Towards the end of my PhD I decided I would like to take some action. I had been trying to draw and render images of molecular structures, and wanted a way to do simple geometry optimizations for posters, papers and web pages. At first I tried to do some of this using an existing commercial package, but it only worked on Windows and we only had one license for the department. The training provided to me as a researcher in areas such as programming and analysis were disappointing and all too often generic tools such as Word, Powerpoint and Excel were the most viable choice for preparing, analyzing and presenting our work. I began writing more software, but much of it was written from scratch with little guidance. As I searched for a better way I came across some open source libraries and tools.

I found a program run by Google called "Summer of Code" where they offered me the opportunity to "flip bits not burgers". I was extremely lucky to find an idea on KDE's idea page for a molecule editor in Kalzium. I was very excited, and had been using KDE for many years. This was a pivotal moment for me, where my life and career took a twist I never expected into the world of open science - and I have loved every minute of it.

It was through that work that I became involved in the Avogadro project, and later Open Babel and met Geoff who later that year offered me a position in his new research group. This was an exciting opportunity as not only did we share a passion for correlating experimental and computational techniques, Geoff was also very active in open chemistry. After I moved out to Pittsburgh Geoff introduced me to the Blue Obelisk, and I now proudly count myself as one of their un-members. We published an open access paper on the Blue Obelisk five years on last year.

After a two year postdoctoral position with Geoff, who was extremely supportive of my work in open chemistry, I met Bill Hoffman from Kitware. I knew that Kitware developed CMake, but beyond that was not really aware of what they did. It turned out that they were involved in much more than just CMake, with open source tools and frameworks such as VTK, ParaView, ITK, CDash and more. They had been working on open scientific software for over a decade, and they were hiring! They weren't just making applications either, they were tackling the whole problem including development, testing and validation of open-source, cross-platform applications and frameworks.

After accepting a position with Kitware in 2009 one thing I never really appreciated was just how poor access is to publicly funded research. I can no longer access scientific papers I and others wrote, that were funded with tax payer money from both the UK and the US! I think that is terrible, and later realized I had become part of the scholarly poor, Peter wrote a follow up detailing the plight of those of us in industry. There is currently raging debate on open access, and campaigns such as The Cost of Knowledge need our support. The products of publicly funded research should be available to all, whether they are in academia, industry, government or anywhere else.

There are too many black boxes in science today, too much published work that is not available to all or reproduced by others. Mathematics used to be the language of science, but more and more it is computer software that is needed to learn more, and too much of this code is closed, unpublished and poorly shared. Papers must include mathematical proofs, or refer to proofs already published, but it is common to see work published that used closed, proprietary package X to conduct a simulation. This is changing, and Scientific American recently published an article on how "Secret Computer Code Threatens Science". Science also published an article about "Shining Light into Black Boxes", detailing the growing problem of witheld source code preventing meaningful peer review and reproducibility of research.

Michael Nielsen published a book called "Reinventing Discovery" that talks about the value of networked science, and is well worth a read if you have not yet had a chance. The Panton Principles outline the need to make scientific data open, and the Science Code Manifesto calls for openly available code in science. The core goals of the Blue Obelisk are open data, open standards and open source. I think for science to progress we must embrace openness, and sharing and resist the urge to hoard data building up small empires on proprietary code and data.

One thing I hope to see come from all of the controversy of the Research Works Act is a clarification that publicly funded research should be available to all, whether you think they will understand it or not. Scientists need to get better at communicating with the general public, and being more transparent about how research is done. I think open science will give us a chance to increase public engagement in science, which seems to be a growing problem in an age where we can all access the internet and a wealth of knowledge available on it.

I think that we need to figure out sustainable ways to fund the development of open software platforms to enable the next generation of researchers to push back the frontiers of science. We need to remember that we are publishing to share the results of (often publicly funded) research, and so we should be using liberal licenses such as CC-BY, CC0 that allow reuse and further analysis. We also need liberally licensed software that allow those same things, with simple licenses such as BSD and Apache 2.0. These libraries should contain well-tested implementations of data structures, algorithms and best structures, along with training for researchers to help them take advantage of these resources. If there is a better way to do something, contributions and integration should be encouraged as is the case in most open source communities.

Our Open Chemistry project recently got Phase II SBIR funding, and I am very excited to be leading that work at Kitware. It is part of a collaborative, open effort to improve the tools and frameworks available in the area leveraging new software processes to enable wider community involvement.

Leap Day: Never Enough Time

What a busy year it has been so far, a leap day hardly seems enough to help me catch up! I started off the year with a meeting in Cambridge, England on Semantic Physical Science which was hosted by Peter Murray-Rust. I ended up leading the working group on CML and the developing a roadmap to move forward. Peter blogged about this on my birthday (by chance) and you can see the video of my summing up of the results, along with all the other videos from the final day.

While I was back in England I took the opportunity to visit friends and family, along with a day trip to Liverpool to meet with Abbie and Jens. While I was there we discussed some plans around alternate inputs for Avogadro for an upcoming MP visit at the end of January. I found some time to blog about that on the Kitware blog, and Abbie wrote up the visit on their site. I think engaging more people in chemistry is important, and whilst I don't think the interaction is ideal at the moment I was pleased to see them enjoying it. The Kinect is something that many groups can purchase, and if it helps engage a wider audience in science I think that is a great thing.

I am very excited about the work we are doing in Open Chemistry at Kitware. We have been bringing web sites and testing online, and have begun engaging more people in the development process. The official announcement of our Phase II funding went out in January too, and I set up an Open Chemistry group on Google+ if you would like to follow new developments there.

I am especially excited after meeting some people from EMSL at the Semantic Physical Science meeting in Cambridge about the possibilities of working with NWChem more in the future. The open source license they switched to last year is of a very similar liberal nature to that of many of the open source projects we work on at Kitware. There are a large array of techniques available in NWChem, and interest in correlating computational and experimental observables.

We have also been extending Gerrit to support topic branch reviews, and switched VTK to use it for all code submissions. You can see proposed topics and they will trigger automatic build tests using CDash@Home for members of the core group. The Open Chemistry projects are also using the same Gerrit server for code review, and I am adding automated build testing of topics as I find time (any more leap days would help).

As my extra day draws to a close I realize there is still so much more I should get down. I will aim for more discipline in adding more regular entries here, you can follow my Google+ updates if you would like more updates on open source, open science and the life of a scientific software developer.

Reflections on 2011: Open Source, Open Science and Open Chemistry

It has been so long since I uttered a word here, 2011 was certainly a busy year for me and I hope to ensure I dedicate more time telling people about what I am up to in 2012. In preparation for that I have spent some time moving my blog to some new hardware, in the cloud. I also got to the bottom of the poor performance of page loads, and things should be much snappier now. After all that I figured it was time for a cosmetic refresh, so after upgrading the Serendipity I selected a new and hopefully cleaner theme.

As an extra special treat I updated the photo to something a little more recent to - me emerging from my TARDIS with a mug of espresso in hand! Kitware grew a lot last year, so much so that we had to take space in a new building across the road. It was decided that our scientific computing team would move, along with the communications team. In the move I got my own office, with a view of the old office across the street.

VTK was accepted as a mentoring organization in the Google Summer of Code program. We were lucky enough to get two very talented and tenacious students who produced some great work over the summer. We also continued improving and extending Gerrit, and thanks to the Google Summer of Code I had the opportunity to attend both the Mentor Summit and the Git Together (held the day after). Not satisfied with two meetings in one trip I also attended a small portion of the Open Science Summit, and hope to be able to attend the whole thing if it happens again in 2012.

Speaking of Open Science, 2011 was a big year for the area both on a personal level and in the wider community. I talked about our work in open science at several conferences, and more specifically the work we have been doing in Open Chemistry. I wrote a Source article introducing the work that we have done in Open Chemistry since I joined Kitware, and we recently acquired the openchemistry.org domain and have begun populating it. We were also awarded a Phase II SBIR which gives us two years of funding to develop many of the applications and libraries that I mentioned in the article.

The Science Code Manifesto was conceived in 2011, and Michael Nielsen released his new book Reinventing Discovery. There was also Open Access Week which highlighted the need for open access to scientific journals and data. I was very pleased to be a coauthor on two open access articles in 2011, the Quixote project and the Blue Obelisk five years on. I was also honored to receive my own Blue Obelisk award in 2011 from Peter Murrary-Rust!

There is so much else, but I am out of time for now and this post is already very long. Google+ was released to the masses, you can see a little more of me there, and I have created an Open Chemistry page that I will try to keep updated over the coming year. I was invited to a workshop on Semantic Physical Science in Cambridge, England and so I will be starting my traveling very much earlier than usual - leaving next Thursday. Here is to a great new year, one in which I hope we as a community can make significant progress in opening science for the world, and creating a truly shared set of tools for all!

I hope that 2012 is the year more of us start sharing in meaningful ways.

Conferences: Talking Open Science at OSCON, Desktop Summit and Chemical Databases Meeting

Over the last two months I have had one of my most hectic travel schedules ever. It started withOSCON, and a panel discussion about "Practicing Open Science". This one was a bit of a surprise, as Bill Hoffman was originally presenting with Will Schroeder and Brian Wylie, from Sandia National Laboratories. As Bill couldn't make it we decided to change the content of my section, and talk about the new open chemistry area that I have been working on for about four years now. Will went first, followed by me and a wrap up from Brian, with a nice flow between Kitware working on open science for over a decade, me growing a new area of open science (now at Kitware) and Brian giving a government perspective on open source and open science. The slides are below and on slideshare if you would like to take a look.

I thoroughly enjoyed OSCON, and would love to attend future events. The toughest thing was deciding which talks to attend as there were often multiple tracks with talks of interest to me. This was also by far the largest and most commercialized open source event I have attended so far, in the beautiful city of Portland, OR. I couldn't stick around for long after the conference as I was flying out to England on the following Tuesday, and on to Berlin, Germany Friday to attend the Desktop Summit. This was my first time in Germany, and I was looking forward to exploring Berlin a little, along with some time to catch up with a few family and friends in England before and after the conference. I talked about "Open Source Visualization of Scientific Data" on the final day of the main conference, and was very pleased to have a large and interested audience. Here I also discussed my work in open chemistry, along with a lot of the other work we do at Kitware in the Scientific Computing group.

I stayed for the remainder of the conference, attending my first KDE e.V. meeting, and was joined by Bill Hoffman towards the end of the week. Bill gave a workshop on using CMake, and I helped out with that, along with taking part in several BoF sessions and meetings. It was a very hectic week, very different feel to OSCON with a lot of great presentations, BoFs and hacking sessions. I also had the opportunity to meet up with Alexander Neundorf who was an intern at Kitware for half a year, and several other KDE developers interested in build systems, software process, testing, coverage and related areas.

Then I was back home for just over a week before braving the elements and heading straight for the path of hurricane Irene. I was invited to the 5th Meeting on U.S. Government Chemical Databases and Open Chemistry where I talked about "Chemical Databases and Open Chemistry on the Desktop". This meeting was very focused on chemical databases and the open chemistry I have been working on so hard for the last few years. It was a great experience to be able to see what others are working on, and discuss possible points for future collaboration. There is some amazing work happening in this area, and this meeting helped me gain greater clarity on how my work at Kitware can fit into the larger picture to significantly improve the landscape in open chemistry.

Thanks to Kitware for allowing me to attend, and funding my travel/other expenses, and to my wife and son for tolerating my long absences over the last couple of months. An even bigger thank you to my wife, Louise, for letting me off the hook on my first missed wedding anniversary so that I could present at OSCON! I had some great news about funding for the continued development of many of the ideas discussed in the slides, and so hope to have much more to talk about over the coming months (and years). This post is already pretty long, I hope to continue developing this work and promoting open science, especially in chemistry, materials science, physics and the bio areas. There are lots of other amazing people working in these areas too, and I feel like we are getting to a point where we can create real change to improve the outlook in scientific research.

Talking About Open Source Visualization of Scientific Data at the Desktop Summit

I have begun my journey to the Desktop Summit, making the flight over from the US to Manchester yesterday. A short stay in Sheffield, and catch up with family before heading out to get my flight to Berlin tomorrow. I will be talking about the work I have done both at and before joining Kitware with the title "Open Source Visualization of Scientific Data". I plan to talk about a range of work from my Google Summer of Code project on Kalzium back in 2007, through to some of the exciting work at Kitware in VTK, ParaView and Titan looking at the challenges of large data, remote visualization and how to integrate the web and smartphones/tablets into the scientific data visualization workflow.

Desktop Summit 2011

Bill Hoffman is also planning to attend, and we will be running a workshop introducing CMake on Thursday. This is my first Desktop Summit, although Bill and I have both attended previous aKademy and Camp KDE meetings. I should be in on time to attend the pre-registration event, and will not be leaving until Saturday. Looking forward to a great summit, catching up with some old friends and making some new ones. Now, I think I should try to get some sleep before my flight tomorrow!


Talking at OSCON 2011 about Open Science

I am currently on a plane bound for Portland, Oregon enjoying the in-plane wi-fi. Will Schroeder, Brian Wylie and I will be talking about "Practicing Open Science" on Friday in the government track. I am standing in for Bill Hoffman who unfortunately could not make it, and will be discussing the work I have been doing to grow open chemistry both at Kitware and outside of Kitware with many amazing collaborators scattered around the world. I am really excited to have the opportunity to talk at OSCON, and would be happy to meet up and discuss this work if you are at OSCON. Will and Brian are both very passionate about open science too, they will both give their unique perspectives on practicing open science. I will be there from this evening and don't fly out until early Saturday morning.

OSCON 2011

I am very much looking forward to OSCON, and the major difficulty I have had is choosing between the talks that are all happening at the same time. In some cases there are two or three I would like to see in any given slot. I am hoping to attend the KDE release party tomorrow too, please join us there if you would like to celebrate with us.

Avogadro 1.0.3 Released

I am pleased to be able to announce the availability of Avogadro 1.0.3! What happened to Avogadro 1.0.2 I hear you ask...shortly after tagging Michael reported an issue with i18n building/installations. So 1.0.3 contains a couple of very small build system fixes, but see the 1.0.2 release notes for details of most of the fixes.

As always, we appreciate your feedback. There are still a few issues outstanding, but many things were fixed. These binaries are also built against much newer versions of Qt and Open Babel where significant improvements have also been made. There may be one or two more releases of the 1.0 line if necessary (I have streamlined the release process with a view to making more releases), but I would like to focus our efforts on an unstable release for 1.1. Once 1.1 is stable, a 1.2.0 release will be cut and branched. There are lots of new features in master that we would love more feedback on.