From flocks of birds to schools of fish, data can take many shapes

flocks of birds represented as points

A flock of birds, represented as dots, illustrates the complexity of data at both local and global scales. Image credit: Henry Adams

When Henry Adams thinks about a vast store of data, like a million points in three-dimensional space, or the medical records of 1,000 patients, the Colorado State University mathematics assistant professor thinks about how that data can be represented as a geometric shape.

Considering data as shapes can help mathematicians find new patterns in that data. Though a dataset may be too high-dimensional for the human mind to perceive on its own, the shape of a dataset can be illuminated with the aid of advanced computing.

Adams is at the forefront of an emerging field called topological data analysis, the merger of a pure mathematical field – topology – with an applied one – data science. For much of the 20th century, topology, or the study of shape and space, lived only in the realm of pure mathematics. But in the last two decades, topology has become increasingly quantitative, computable and statistical, according to Adams. Now a field of its own, topological data analysis describes shapes of data in multiple dimensions and resolutions, with new applications in fields like biology, materials science and economics.

An easy way to think about how data has shape, Adams says, is to consider biological systems like flocks of birds or schools of fish that exhibit collective behaviors. A video of starlings flying in a distinctive pattern, for example, “is an extremely complex dataset, with so many birds, and so many time steps,” Adams said. “So even just to get your mind around this dataset and start analyzing it is quite complicated.”

That’s where Adams’ work comes in: He, along with colleagues, have devised methods for taking vast datasets like flying birds and condensing them in ways that are easier to work with.

Lens for viewing data

Earlier this year, Adams and mathematics colleagues, including CSU Ph.D. graduate Lori Ziegelmeier, published a SIAM news piece summarizing how computational topology provides a lens through which mathematical modelers can tackle large datasets and glean insights from the shapes and patterns found therein. The researchers, along with recent CSU Ph.D. Rachel Neville, had organized a conference on the same subject the previous year, bringing together pure and applied mathematicians to advance the field of topological data science.

All this evolved from some afternoons in the Weber Building a few years prior, during meetings of an interdisciplinary group called the Pattern Analysis Laboratory.

Founded in 2007 by CSU math professors Michael Kirby and Chris Peterson, the Pattern Analysis Laboratory is a laid-back, open environment in which mathematicians from different disciplines get together and let their interests, not a set agenda, guide the conversation.

From one of those meetings, Adams says, the idea emerged for a paper the group eventually published in 2017 that has been cited over 200 times since then. The paper introduces an algorithm for taking a complex dataset and summarizing it in a lower-dimensional format, so that it can be fed into a machine learning algorithm.

“Persistence Images: A Stable Vector Representation of Persistent Homology,” was co-authored by former CSU graduate students Tegan Emerson, Rachel Neville, Sofya Chepushtanova, Eric Hanson, Francis Motta and Lori Ziegelmeier, and mathematics professors Kirby, Peterson and Patrick Shipman

The researchers’ work created new opportunities for applying topological principles to large datasets in fields ranging from archaeology to biology. For example, archaeologists in Poland used the CSU researchers’ topology principles to study cutaway patterns in ancient rock carvings. Now, Adams is working with chemists from Washington State University, CU Boulder and University of Illinois at Urbana-Champaign to apply such principles to molecules in various energy regimes.

Adams credits the paper’s popularity to the Pattern Analysis Lab’s open and collaborative environment, along with a tendency among mathematicians in general to share ideas for pure enjoyment and inspiration. He continues to work on both the applications and theory of topological spaces.