Sound decisions require sound data. Researchers and organizations make decisions based on the data stored in data warehouses, so data quality is crucial. How do we ensure the data is correct?
Computer science graduate student Hajar Homayouni is answering that question. For her research in data warehouse assurance, CSU has selected Homayouni’s masters thesis to represent the University in the Western Association of Graduate Schools (WAGS)/ProQuest Distinguished Master’s Thesis Award competition.
What is a data warehouse and how does it work?
Data warehouses are widely used in domains like science, healthcare, commerce, and industry. Storing data in a warehouse sounds simple, like piling boxes and old sports equipment in a storage unit.
But data warehouses do more than just store data. They also gather data from various sources, and then clean, integrate and translate it into a common form before storing it. These transformations, called Extract-Transform-Load (ETL) processes, make data easier to analyze and use in decision-making.
Translating data accurately
The problem is that ETL processes are complex and error prone. These transformations require different types of source-to-target mappings that are tedious with little existing documentation. Homayouni is focusing on the accuracy of the mappings, investigating novel testing techniques, and developing ways to identify the mappings automatically and correctly.
This research is valuable in many application domains, but Homayouni is testing it in a challenging arena – healthcare. Data warehouse assurance is critical in healthcare, where decisions can be urgent and the consequences of data errors high.
Working with researchers and developers at the Health Data Compass research group at the University of Colorado Anschutz Medical Center, Homayouni is applying her testing techniques on their health data warehouse. She has already improved their quality assurance process and found previously undetected faults in their test software using her approach.
Her research contributions have been published in the “Proceedings of the 22nd International Database Engineering and Applications Symposium” and appear in the book Advances in Computers.
Building research and accomplishments
Homayouni’s research and recognition are gathering steam. After receiving her Bachelors of Computer Science from the University of Kashan in 2008, she came to CSU to pursue her M.S. in computer science advised by Professors Sudipto Ghosh and Indrakshi Ray.
This year, she received the Robert B. France Fellowship in Computer Science for her superior performance and earned her masters degree. The Graduate School has selected her masters thesis, “An Approach for Testing the Extract-Transform-Load Process in Data Warehouse Systems,” to represent CSU at the upcoming WAGS & ProQuest Awards.
The WAGS/ProQuest Distinguished Master’s Thesis Award recognizes distinguished scholarly achievement at the master’s level. Two thesis awards are given annually, one in STEM and one in non-STEM. Homayouni’s thesis will compete in the STEM category, which includes biological sciences, mathematical and physical sciences, life sciences and engineering.
The award prize includes a certificate, $1000 to the recipient, and paid travel expenses to the WAGS conference and awards luncheon. Winners will be announced at the conference in March 2019.
In the meantime, Homayouni is continuing her computer science graduate studies at CSU. After improving the quality assurance process at Anschutz Medical Center, the Health Data Compass group eagerly offered to work with her on Ph.D. research. She is taking them up on it and exploring her career options in industry and academia. Good luck at the WAGS competition Hajar, and we’re proud you are a CSU Ram!