Illinois Rocstar has begun a new DOE Phase I SBIR entitled, “The Chemlab Chemistry and Materials Science Lifecycle Data Repository.” The Chemlab project is designed to provide a place for scientists and engineers not only to archive and document their simulation datasets, including provenance information and, at times, output data (or detailed information about the generation of the output data) but to present a location to find datasets again later.
There are many sources for chemical and materials data available on the web, some of which are commercial, while others are freely available. Many universities have pages dedicated to helping their students and researchers find databases that can be accessed through their university library subscriptions (e.g., University of Illinois, MIT, etc.). There are even data repositories available for some disciplines. There is no shortage of sources for data, albeit it can still be difficult to find what is needed, especially for highly-specific or otherwise unusual compounds and/or materials. The Chemlab project does not aim to be another chemistry/materials database project, although it is our intent to help researchers locate other databases, but rather stems from previous work Illinois Rocstar has performed in the area of Simulation Lifecycle Management. Scientists and engineers performing chemistry and materials-oriented M&S certainly have the need to find appropriate data for their simulation activities, but another important facet of M&S that is rarely addressed is the so-called “dataset lifecycle” for the simulation datasets and results themselves, as illustrated in the figure below: birth→document→use→archive→search→reuse.
From a community perspective, a place to share data (when appropriate) is paramount. Other researchers with permission will be able to find and use the datasets, which in turn facilitates collaboration and reuse of data. Our vision is to have a publicly available version of Chemlab freely available on the web for researchers with the option to allow their data/datasets to be made public. By implementing a multisite search capability, we intend to allow private organizations to keep their data private while maintaining the ability to search the public instance seamlessly.