Sidak Pal Singh

Graduate student

ETH Zurich

Max Planck Institute for Intelligent Systems

Biography

I am a doctoral fellow within the Max Planck ETH Center for Learning Systems, where I am advised by Thomas Hofmann at ETH Zürich (home base) and Bernhard Schölkopf at MPI-IS Tübingen. My research interests are in the broad areas of deep learning theory, optimization, and causal representation learning.

Previously, I finished my Master’s degree in Data Science at EPFL, where I worked on model fusion and natural language processing via optimal transport, advised by Martin Jaggi. I did my master thesis at IST Austria with Dan Alistarh and focussed on efficient second-order approximation for compressing neural networks. Before EPFL, I completed my undergraduate studies in Computer Science at Indian Institute of Technology (IIT) Roorkee.

For more details on my research, ~~check out this research statement~~ nay, that’s a bit too old. Please see the publications page instead. I’d be delighted to hear in case you’ve similar interests and would like to collaborate.

Pro bono: I am happy to mentor motivated Master/Bachelor students who’d like to have a taste of research in these topics (click to see some potential directions).

Parametrically-efficient neural networks
Understanding Transformers
Higher-order optimization methods for neural networks
Multi-trajectory gradient optimization
Holistic understanding of batch size & learning rate

But, due to time constraints, please get in touch only when you are seriously interested.

Interests

Theory of Deep Learning
Optimization
Optimal Transport
Causality
Model Fusion

Education

Ph.D. in Computer Science (ongoing)

ETH Zürich, Max Planck Institute for Intelligent Systems
M.Sc. in Data Science, 2020

Ecole Polytechnique Federale de Lausanne
B.Tech in Computer Science, 2017

Indian Institute of Technology, Roorkee

News

Our first work on understanding Transformers to appear at NeurIPS’22.
September 1st, 2022: Back to Zürich 🇨🇭 — Alles Guet!
Attended the Les Houches summer school on Statistical Physics of Machine Learning — Amazing lectures + Pristine location + Cool people to hang out.
💥 @ICLR’22: Double Descent is now proved in the general case of fully-connected Neural Networks. No strict assumptions about the structure or the optimizer needed — Hessian is All You Need 😋
Nominated as a PhD student member of the European Lab for Learning and Intelligent Systems (ELLIS).
💥 Excited to share the first paper from my PhD research @ NeurIPS’21. TLDR: Neural Networks provably have much lower number of effective parameters — beautiful formulae included 🎉
Attended the Heidelberg Laureate Forum (amongst the top 225 young researchers selected in Maths and CS worldwide).
September 1st, 2021: Started my CLS exchange in the charming Tübingen 😄.
Participated in the Princeton Deep Learning Theory Summer School 2021 (really interesting although virtual).
Yay, 2 papers at NeurIPS 2020 (Model Fusion; WoodFisher)! :D
Gave a talk on Model Fusion at the DLCT reading group (slides).
Invited talk at the Google sparsity reading group on WoodFisher (slides).
Top 33% reviewer, certificate of appreciation ICML 2020.
September 1st, 2020: Moved to Zürich for my PhD! :)
Presented Context Mover’s Distance (CMD) at AISTATS 2020 (video).
Older News
- Participated in the Cornell, Maryland, Max Planck pre-doctoral research school 2020.
- The preprint based on my master thesis is online, WoodFisher: Efficient second-order approximations for model compression.
- Our Context Mover’s Distance paper is accepted at AISTATS 2020.
- Presented our work on Model Fusion via Optimal Transport at the OTML workshop in NeurIPS (2019).
- Received travel award & selected amongst top 50% reviewers, NeurIPS (2019).
- Our paper on Context Mover’s Distance & Barycenters is accepted at ICLR DeepGenStruct workshop (2019).
- Preprint of my internship work at FAIR is out, GLOSS: Generative Latent Optimization of Sentence Representations (2019).
- Preprint for our recent paper on Wasserstein is all you need (2018).
- Student grant to attend ML4P workshop in Oxford (2018).
- Excited to join Facebook AI Research (FAIR) as an intern this fall (2018).
- Selected for The Alan Turing Institute’s, data study group (2017).
- Offered fulltime Research Fellow position at Microsoft Research India (2017).
- Received Honda Y-E-S (Young Engineers and Scientists) Award 2016 (one of 14 students selected all over India).
- Awarded Google Venkat Panchapakesan Scholarship and invited visit to Google YouTube and Mountain View offices (2016).

Publications

More Publications

PDF Code

Preprint

Experience

Disclaimer: The ‘2020 Reflection(s)’ refer to only my own personal views! (also, serves as an amusement)

September 2018 – February 2019

Menlo Park, California

Research Intern

Facebook AI Research

Worked on building non-compositional embeddings for application in text representation and generation.

2020 Reflection: NLP is a super cool area, and I am really fascinated by linguistics & how languages evolve. But, I need a short break from NLP research!
Bonus reflection : No rush anyways, there is still some time until the septillion-parameter language model gets ~~brute-forced~~ efficiently implemented ;)

May 2016 – July 2016

Kyoto, Japan

Research Intern

Kyoto University

Developed a training mechanism for Generative Adversarial Networks (GANs) using entropy regularized Wasserstein distances, guided by Marco Cuturi.
Utilized Large Margin Nearest Neighbors (LMNN) for learning the ground metric. Implemented the system in Chainer, with the architectural inspirations from DCGAN.
2020 Reflection: Missed making it work before Wasserstein GAN :/ Nevertheless, what I learned about optimal transport, eventually sparked the core ideas for my next two papers.
Bonus reflection: Indebted to Honda Foundation for sponsoring this visit and to Marco Cuturi for teaching me about optimal transport.

November 2015 – January 2016

Bangalore, India

Research Intern

Xerox Research Centre

Developed prototype of a multimodal trip planning system that integrates dynamic ridesharing with scheduled transportation services.
Used k-medoids algorithm to find clusters of landmarks in road network graph. Implemented a variant of hill climbing algorithm & silhouette analysis to find the optimal number of clusters.
2020 Reflection: Interesting things can be done even without deep learning :P

May 2015 – July 2016

West Lafayette, Indiana

Summer Intern

Purdue University

Designed and implemented a method to estimate the relevance of reviews using their metadata, with a particular focus on reviews with limited votes.
Implemented consumer Rating as a Service (RaaS) architecture and provided a RESTful API for interaction, which were written using Node.js and Express with MongoDB for persistence.
2020 Reflection: Here, I learned what research is and carried out my first research project :)

Sidak Pal Singh

Graduate student

ETH Zurich

Max Planck Institute for Intelligent Systems

Biography

Interests

Education

News

Publications

Signal Propagation in Transformers: Theoretical Perspectives and the Role of Rank Collapse

Phenomenology of Double Descent in Finite-Width Neural Networks

Analytic Insights into Structure and Rank of Neural Network Hessian Maps

WoodFisher: Efficient second-order approximations for model compression

Model Fusion via Optimal Transport

GLOSS: Generative Latent Optimization of Sentence Representations

Context Mover's Distance & Barycenters: Optimal transport of contexts for building representations.

Wasserstein is all you need

Experience

Research Intern

Facebook AI Research

Research Intern

Kyoto University

Research Intern

Xerox Research Centre

Summer Intern

Purdue University

Recent Posts

How I became a vegetarian overnight?

How to obtain transcripts at IIT Roorkee?

Invictus