Postdoctoral researcher, UKP Lab, TU Darmstadt
I am an NLP researcher interested in Machine Learning in general. Currently, I am working as a postdoctoral fellow at Ubiquitous Knowledge Processing (UKP) Lab. Earlier, I worked as a Senior Research Associate at Laboratory for Computational Social Systems, IIT Delhi. My current research interest revolves around Large Language Models; precisely focused on reasoning, prompt engineering, and, interpretation. Additionally, I share interest in Temporal Graph Representation Learning. I completed my PhD in 2023 with my doctoral thesis titled Engagement to Persuasion: A Computational Study on Online Social Discourse. My doctoral research is centered around the qualitative and quantitative analysis of online social platforms.
Reasoning with LLMs is one of my key research interests. I have been exploring different techniques to elicit superior mathematical reasoning capabilities into relatively smaller models, such as, separation and finetuning of problem decomposition expertise for modular reasoning [EMNLP 2023], reinforcement learning from tool-usage feedback [AAAI 2024], etc. I am currently working on mechanistic interpretation of LLM reasoning and knowledge retrieval [Preprint]. Additionally, I have been working on nuances of in-context learning in low-resource settings. My work on cross-lingual In-context learning has received outstanding paper award at [ACL 2023]. You may check out my recent opinion piece on reliability of AI assistants for science communications, published in the Communications of The ACM. In my doctoral research, I have worked on aligning pretrained LMs with unsupervised finetuning towards superior argument understanding [ACL 2022]. Earlier, I have explored the possibilities of building compute-efficient Transformer architectures from the perspective of dynamical systems [NeurIPS 2021].
In my doctoral research, I worked on predictive modeling of user engagement in online platforms under various exogenous and endogenous influences [TKDE 2022][KDD 2020]. An important problem explored in my doctoral thesis was determination of the interdependence between user opinions and network dynamics [WSDM 2022][PNAS Nexus 2023].
Primarily stemmed from my doctoral research, I have been working with representation learning of temporal graphs and interaction networks, including inductive link prediction, incremental learning on large graphs, and, geometric deep learning.
Check out my Google Scholar for an extensive list of publications.
This study explores the neural mechanisms of CoT in LLMs. We find that LLMs employ multiple pathways for step-by-step reasoning, with a notable shift in functionality in the middle layers. Token representations initially favor pretraining, but later transition to in-context information. This shift is evident in attention heads, where those generating answers dominate the later layers, while heads handling ontological relationships are prevalent in the initial layers.
Reasoning with LLMs can be done better if we separate out the solver (typically a larger model) from the decomposer (typically a smaller model) and finetune the latter to break down the problem upon feedback from the solver. Decomposer is finetuned as an agent interacting with the blackbox environment constructed by the solver; that is, once a decomposer is trained, it can act with any solver of any scale.
Almost random predictions in cross-lingual ICL from multilingual LLMs can be made significantly better via alignment: choose examples semantically similar to the target input, and transfer the task knowledge from source language to target using manually designed aligners.
Pretrained LMs can be made aware of the signals of argumentative discourse via an unsupervised finetuning step using social discussion data. Additionally, prompt-based finetuning that are aligned to the pretraining objective can elicit generalizable argument understanding.
Transformer architecture follows a close analogy with the temporal evolution of a multi-particle dynamical system. By incorporating explicit depth-wise evolution operator, one can come up with compute- as well as parameter-efficient sequence-to-sequence architectures that are as good as the original transformer if not better.
You can connect me via subha0009 [at] gmail [dot] com