I am a computer scientist turned applied mathematician developing statistical methods to infer non-local properties of networks – like measures of long-range connectivity and causal effects – under realistic constraints of partial observations in the natural and social sciences. I am a Schmidt Science Fellow in the Institute for Data, Systems, and Society at Massachusetts Institute of Technology, working on causal inference in networked settings with Prof. Dean Eckles at MIT Sloan School of Management and Prof. Elchanan Mossel at MIT Mathematics.

I was previously an EPSRC Doctoral Prize Fellow in the Department of Mathematics at Imperial College London, where I also completed my PhD supervised by Prof. Nick S. Jones in the Systems and Signals group at Imperial and Prof. Sumeet Agarwal at IIT Delhi. I studied how people form social connections, and how social networks relate to their health, which involves research at the crossroads of network science, sociology and public health. In particular, I developed methods to infer the structure and statistics of large-scale social networks. My talks at Networks 2021 and Sunbelt 2020, and poster presented at IC2S2, capture some of my research on these topics. I also collaborate with the Vaccine Confidence Project at LSHTM and Social-Decision Making Lab at Cambridge to study misinformation and its impact on the COVID-19 pandemic.

Prior to my PhD I was a post-baccalaureate fellow supervised by Prof. James J. Collins at the Wyss Institute for Biologically Inspired Engineering at Harvard University. I worked at the intersection of machine learning and biology, developing technologies for drug discovery, molecular diagnostics, and biological circuit design. I was fortunate enough to work alongside some great experimental biologists, which led to the development of computational tools such as NeMoCAD for drug discovery and MALDI spectral analysis for universal infection diagnosis. This talk I gave at the DAIR seminar at IIT Delhi is a summary of my pursuit to bring mathematical order to biology.

I did my B.Tech in Computer Science and Engineering at IIT Delhi, where my research was primarily guided by Prof. Sumeet Agarwal. My bachelors’ thesis focused on building causal models of gene regulatory networks.

My broader research interest is to apply mathematics and machine learning to better understand how complex biological, cognitive and eventually social systems work–and perhaps more importantly, when they don’t work. And in turn, use the knowledge gained to refine notions of computation itself. I’ve written more deeply about my thoughts on that here.

Resume

GitHub

Email: firstinitiallastname@mit.edu

Research Philosophy

Computation is a vital tool for understanding things at all levels of organization of the world. Right from the fundamental particles of the universe, to the biomolecules that make life possible, to the wonders of biological intelligence, up to the intricate social interactions that create human civilization. There is a lot of shared structure of problems at various levels of the hierarchy, for instance the prevalance of graph structures at the biological, cognitive and societal levels, which is the recurring theme of my research. This pattern, when abstracted out, is precisely what permits the omnipotent use of computation.

Below, I have attempted to lay out my research along this ladder of abstraction.

Computation Core

For Biology

For Cognition

For Society

Computation Core

Partially observed networks

Geodesic statistics for random network families

A key task in the study of networked systems is to derive local and global properties that impact connectivity, synchronizability, and robustness. Computing shortest paths or geodesics in the network yields measures of node centrality and network connectivity that can contribute to explain such phenomena. We derive an analytic distribution of shortest path lengths, on the giant component in the supercritical regime or on small components in the subcritical regime, of any sparse (possibly directed) graph with conditionally independent edges, in the infinite-size limit. We provide specific results for widely used network families like stochastic block models, dot-product graphs, random geometric graphs, and graphons. The survival function of the shortest path length distribution possesses a simple closed-form lower bound which is asymptotically tight for finite lengths, has a natural interpretation of traversing independent geodesics in the network, and delivers novel insight in the above network families. Notably, the shortest path length distribution allows us to derive, for the network families above, important graph properties like the bond percolation threshold, size of the giant component, average shortest path length, and closeness and betweenness centralities. We also provide a corroborative analysis of a set of 20 empirical networks. This unifying framework demonstrates how geodesic statistics for a rich family of random graphs can be computed cheaply without having access to true or simulated networks, especially when they are sparse but prohibitively large

arXiv preprint

Networks 2021 talk

↑

X-t-SNE

Data visualization of multiple high-D spaces with associated graph structures

t-Stochastic Neighbor Embedding is a method of visualizing very high-dimensional data in 2 or 3 dimensions, that is now ubiquitiously used in data analysis across disciplines. The standard implementation allows visualization of a single feature space. We extend t-SNE to multiple feature spaces that could have an associated graph structure. That is, every data-point exists in multiple spaces, and all the data points are related in one or more graph structures. This extension is very useful since the generic graph structure can encode interesting domain knowledge that guides data visualization in the low-D space.

sahil loomba

Schmidt Science Fellow @ MIT IDSS

Research Philosophy

Computation Core

Partially observed networks

X-t-SNE

iWMMM

Hierarchical Ensemble Classifier

Computation for Biology

Biomolecules

NeMoCAD

Project THoR

Project SD2

protein2vec

Gaussian Process Models for Time-Series Omics Analysis

Project ConDDR

Causal Computational Models for GRNs

Evolvable Networks

Organisms

Project MALDI for Diagnosis

Game of Apoptosis

Project Abbie

Computation for Cognition

Human Category Learning

Human Form Learning

Human Motor Control

Complexity of Living and Biological Systems

Computation for Society

Social Systems

Learning Social Connectivity Kernels

Urban Mobility

Rationality in Economics

Wisdom of Crowds

Pedagogy in the Contemporary World

Multimodal Table of Content for Videos

Video Lecture Sequencing

Morality of Unsettling Interventions

Environment

Plankton and Ocean Health