Anna Zeng

PhD Candidate, MIT CSAIL

I combine human-centered design principles, honed at IDEO, with rigorous data science research at MIT CSAIL. My work focuses on making causal inference more accessible by automatically integrating external data sources and creating interpretable tools that help people understand complex causal relationships.

Anna Zeng - Professional Headshot
Anna Zeng - Casual Photo

About

My journey bridges the worlds of human-centered design and data science. At IDEO, I learned to approach problems with empathy, creativity, and a deep understanding of human needs. This design thinking foundation now guides my technical research at MIT CSAIL.

My research focuses on making causal inference more accessible through better data management systems. I develop tools like Causal Data Integration that automatically mine unobserved attributes from external sources, and algorithms like CAMBA that generate interpretable summaries of complex causal graphs for domain experts.

Beyond causal inference, I've explored diverse areas from programming language adoption (my work on Rust barriers has been cited 29 times) to open knowledge networks. I believe that good research should bridge rigorous methodology with human-centered design—creating tools that don't just process data, but help people understand and trust the insights they generate.

Design Background

  • • Human-Centered Design (IDEO)
  • • Design Thinking
  • • User Experience Research

Research Focus

  • • Causal Data Integration
  • • Graph Summarization
  • • Knowledge Networks

Research

My research focuses on improving causal inference through better data management systems. I work on automating the integration of external data sources and developing tools that make complex causal graphs more interpretable and actionable for domain experts.

Active

Causal Data Integration

Developing systems that automatically mine unobserved attributes from external sources and build corresponding causal DAGs to address data management challenges in causal inference.

Causal InferenceData IntegrationAutomated Discovery
Active

Causal Graph Summarization

Creating algorithms that generate interpretable and usable causal graph summaries from large, comprehensive causal graphs to aid domain experts in causal analysis tasks.

Graph AlgorithmsCausal AnalysisData Visualization
Active

Open Knowledge Networks

Building infrastructure for rapid development of knowledge networks that enable database-like queries with wide coverage, supporting collaborative data curation across institutions.

Knowledge GraphsData InfrastructureCollaborative Systems

Interested in collaboration or learning more about my research?

Let's discuss

Publications

My research has been published in top-tier venues and focuses on causal inference, data integration, and building systems that make complex data analysis more accessible.

Causal Data Integration

Published

Brit Youngmann, Michael Cafarella, Babak Salimi, Anna Zeng

arXiv preprint arXiv:2305.08741 2023

Causal inference is fundamental to empirical scientific discoveries in natural and social sciences; however, in the process of conducting causal inference, data management problems can lead to false discoveries. We introduce the Causal Data Integration (CDI) problem, in which unobserved attributes are mined from external sources and a corresponding causal DAG is automatically built.

Causal Graph Summarization

Thesis

Anna Zeng

Massachusetts Institute of Technology 2023

We introduce CAMBA, a prototype causal graph summarization algorithm that efficiently generates high-quality causal graph summaries that are interpretable and usable for causal inference. Existing graph summarization methods are not guaranteed to provide summarized graphs eligible for use in causal analysis tasks.

Infrastructure for rapid open knowledge network development

Published

Michael Cafarella, Michael Anderson, Iz Beltagy, Arie Cattan, Sarah Chasins, Ido Dagan, Doug Downey, Oren Etzioni, Sergey Feldman, Tian Gao, Tom Hope, Kexin Huang, Sophie Johnson, Daniel King, Kyle Lo, Yuze Lou, Matthew Shapiro, Dinghao Shen, Shivashankar Subramanian, Lucy Wang, Yuning Wang, Yitong Wang, Daniel Weld, Jenny Vo-Phamhi, Anna Zeng, Jiayun Zou

AI Magazine 2022

This article describes a National Science Foundation Convergence Accelerator project to build a set of Knowledge Network Programming Infrastructure systems to address challenges in building, using, and scaling large knowledge networks.

Building a Shared Conceptual Model of Complex, Heterogeneous Data Systems: A Demonstration

Published

Michael R Anderson, Yuze Lou, Jiayun Zou, Michael J Cafarella, Sarah E Chasins, Doug Downey, Tian Gao, Kexin Huang, Dinghao Shen, Jenny M Vo-Phamhi, Yitong Wang, Yuning Wang, Anna Zeng

CIDR 2022

We describe a working demonstration system that aims to build a shared conceptual model for data collaboration. This system borrows ideas from knowledge graphs and other massive collaborative efforts to curate data artifacts beyond the reach of any one person or institution.

Identifying barriers to adoption for Rust through online discourse

Published

Anna Zeng, Will Crichton

arXiv preprint arXiv:1901.01001 2019

Rust is a low-level programming language known for its unique approach to memory-safe systems programming and for its steep learning curve. To understand what makes Rust difficult to adopt, we surveyed the top Reddit and Hacker News posts and comments about Rust.

For a complete list of publications and preprints, visit my academic profile:

Let's Connect

I'm always interested in discussing research collaborations, speaking opportunities, or exploring how human-centered design principles can make data science more accessible and impactful.

Get in Touch

Office

MIT CSAIL
32 Vassar Street
Cambridge, MA 02139

Office Hours

Tuesdays & Thursdays
2:00 PM - 4:00 PM EST
(or by appointment)

Social & Professional

Quick Message

0/2000 characters