CSU professor leverages ‘Data Revolution’ to solve current issues in chemistry

Robert Paton, an associate professor at Colorado State University’s Department of Chemistry, is part of a vanguard of researchers solving problems in chemistry using data science and machine learning.

Paton is part of a National Science Foundation-supported team spanning five universities charged with creating a new generation of data chemists through their Center for Computer Assisted Synthesis (C-CAS). In addition to Paton, the C-CAS team includes Center Director Olaf Wiest as well as Nitesh Chawla, Abigail Doyle, Richmond Sarpong, and Matthew Sigman.

C-CAS combines data science and machine learning with chemistry to transform how the synthesis of complex organic molecules is planned and executed. As a result, a new generation of data chemists and machine learning scholars can be trained and educated to address complex challenges of modern synthetic chemistry.

Paton Group

“It’s exciting,” Paton said. “This idea that you can actually predict and design chemistry given all the complexity. One of the most satisfying things you can experience as a scientist is a successful prediction.”

Both graduate and undergraduate students will participate in the Center’s research, which will also establish networking events, online workshops, and collaborations with students at other schools.

C-CAS is supported by the Centers for Chemical Innovation Program of the Division of Chemistry and will include $1.8 million in funding. Two to three centers are created each year with nine currently in existence. As a “Phase One Center,” C-CAS will run for three years and, pending the outcome, potentially be extended and considerably expanded into a “Phase Two Center.”

Data revolution

In 2017, NSF announced its “10 Big Ideas,” encompassing a long-term research agenda to benefit generations to come. Of the 10, Paton and the C-CAS team fall under ‘Harnessing the Data Revolution’.

The “Data Revolution” is a term used to describe the growing demand for data from all parts of society. It has impacted many fields, and chemistry is now one of those.

Currently, chemistry is recorded in laboratory books, databases inside companies or in the pages of Ph.D. theses. It also can be published in papers, put on online PDFs or captured in patents. There is a multitude of information in various places. Paton and his team are working to build new computational tools to bring all that data together in one accessible place. To do this, they will work in three phases:

  1. Unify data from a variety of sources.
  2. Exploit unified data to represent chemistry in a way that addresses the problems with optimizing chemical reactions.
  3. Apply the data to synthesis planning and the synthesis of complex molecules.

“Access to high-quality information, containing both positive and negative results, will be key to developing new data-driven tools for chemists,” Paton said. “This wealth of knowledge available by computer will open new pathways into chemistry for those who have previously found it inaccessible due to challenges like fume hoods and laboratory spaces.”

The C-CAS team

The Center will be directed by Wiest at the University of Notre Dame and joined by Chawla also from Notre Dame, Doyle from Princeton University, Paton from CSU, Sarpong from the University of California, Berkeley, and Sigman from the University of Utah.

“By teaming up with a multi-institutional approach, we will be able to achieve more than could be achieved by a single investigator,” Paton said.

Each lead investigator has complementary expertise in reaching the outlined goal. Paton’s group uses computational algorithms to understand catalytic reaction mechanisms and to enhance performance.

Sigman develops physical-organic approaches to understand and predict selectivity in organic reactions. Doyle uses ultra-high-throughput experimentation (HTE) technology and computational machine learning to predict the outcomes of reactions.

Chawla specializes in making fundamental advances in machine learning. Wiest uses both computational chemistry and experimental methods to elucidate reaction mechanisms and to perform high-throughput calculations on transition structures. Sarpong focuses on total synthesis, converting simpler chemical building blocks into complex, medicinally interesting natural products.

This project will be a forum for the exchange of ideas. Experts in each field will establish best practices, done in a very visible way, so they will be able to collectively figure out what great tools can be utilized to solve chemistry.

In addition to the researchers at various institutions, the group will also work with several industrial partners such as large pharmaceutical companies.

“The data these partners generate will be key for our progress with this work,” Paton said.

To learn more about or stay up-to-date on the research Paton and his group are doing, visit ccas.nd.edu.