Anna Zeng
PhD Candidate, MIT CSAIL
I combine human-centered design principles, honed at IDEO, with rigorous data science research at MIT CSAIL. My work focuses on making causal inference more accessible by automatically integrating external data sources and creating interpretable tools that help people understand complex causal relationships.


About
My journey bridges the worlds of human-centered design and data science. At IDEO, I learned to approach problems with empathy, creativity, and a deep understanding of human needs. This design thinking foundation now guides my technical research at MIT CSAIL.
My research focuses on making causal inference more accessible through better data management systems. I develop tools like Causal Data Integration that automatically mine unobserved attributes from external sources, and algorithms like CAMBA that generate interpretable summaries of complex causal graphs for domain experts.
Beyond causal inference, I've explored diverse areas from programming language adoption (my work on Rust barriers has been cited 29 times) to open knowledge networks. I believe that good research should bridge rigorous methodology with human-centered design—creating tools that don't just process data, but help people understand and trust the insights they generate.
Design Background
- • Human-Centered Design (IDEO)
- • Design Thinking
- • User Experience Research
Research Focus
- • Causal Data Integration
- • Graph Summarization
- • Knowledge Networks
Research
My research focuses on improving causal inference through better data management systems. I work on automating the integration of external data sources and developing tools that make complex causal graphs more interpretable and actionable for domain experts.
Causal Data Integration
Developing systems that automatically mine unobserved attributes from external sources and build corresponding causal DAGs to address data management challenges in causal inference.
Causal Graph Summarization
Creating algorithms that generate interpretable and usable causal graph summaries from large, comprehensive causal graphs to aid domain experts in causal analysis tasks.
Open Knowledge Networks
Building infrastructure for rapid development of knowledge networks that enable database-like queries with wide coverage, supporting collaborative data curation across institutions.
Interested in collaboration or learning more about my research?
Let's discussPublications
My research has been published in top-tier venues and focuses on causal inference, data integration, and building systems that make complex data analysis more accessible.
Causal Data Integration
PublishedBrit Youngmann, Michael Cafarella, Babak Salimi, Anna Zeng
arXiv preprint arXiv:2305.08741 2023
Causal inference is fundamental to empirical scientific discoveries in natural and social sciences; however, in the process of conducting causal inference, data management problems can lead to false discoveries. We introduce the Causal Data Integration (CDI) problem, in which unobserved attributes are mined from external sources and a corresponding causal DAG is automatically built.
Causal Graph Summarization
ThesisAnna Zeng
Massachusetts Institute of Technology 2023
We introduce CAMBA, a prototype causal graph summarization algorithm that efficiently generates high-quality causal graph summaries that are interpretable and usable for causal inference. Existing graph summarization methods are not guaranteed to provide summarized graphs eligible for use in causal analysis tasks.
Infrastructure for rapid open knowledge network development
PublishedMichael Cafarella, Michael Anderson, Iz Beltagy, Arie Cattan, Sarah Chasins, Ido Dagan, Doug Downey, Oren Etzioni, Sergey Feldman, Tian Gao, Tom Hope, Kexin Huang, Sophie Johnson, Daniel King, Kyle Lo, Yuze Lou, Matthew Shapiro, Dinghao Shen, Shivashankar Subramanian, Lucy Wang, Yuning Wang, Yitong Wang, Daniel Weld, Jenny Vo-Phamhi, Anna Zeng, Jiayun Zou
AI Magazine 2022
This article describes a National Science Foundation Convergence Accelerator project to build a set of Knowledge Network Programming Infrastructure systems to address challenges in building, using, and scaling large knowledge networks.
Building a Shared Conceptual Model of Complex, Heterogeneous Data Systems: A Demonstration
PublishedMichael R Anderson, Yuze Lou, Jiayun Zou, Michael J Cafarella, Sarah E Chasins, Doug Downey, Tian Gao, Kexin Huang, Dinghao Shen, Jenny M Vo-Phamhi, Yitong Wang, Yuning Wang, Anna Zeng
CIDR 2022
We describe a working demonstration system that aims to build a shared conceptual model for data collaboration. This system borrows ideas from knowledge graphs and other massive collaborative efforts to curate data artifacts beyond the reach of any one person or institution.
Identifying barriers to adoption for Rust through online discourse
PublishedAnna Zeng, Will Crichton
arXiv preprint arXiv:1901.01001 2019
Rust is a low-level programming language known for its unique approach to memory-safe systems programming and for its steep learning curve. To understand what makes Rust difficult to adopt, we surveyed the top Reddit and Hacker News posts and comments about Rust.
For a complete list of publications and preprints, visit my academic profile:
Let's Connect
I'm always interested in discussing research collaborations, speaking opportunities, or exploring how human-centered design principles can make data science more accessible and impactful.
Get in Touch
Office
MIT CSAIL
32 Vassar Street
Cambridge, MA 02139
Office Hours
Tuesdays & Thursdays
2:00 PM - 4:00 PM EST
(or by appointment)