About Me

I'm a second-year M.S. student in the Master of Computational Data Science (MCDS) at Language Technologies Institute (LTI), School of Computer Science (SCS), Carnegie Mellon University. I primarily conduct research with leading scholars in search and IR, Prof. Jamie Callan and Prof. Chenyan Xiong. My research focuses on RAG systems and DeepResearch systems, with strong interests in Agents and LLM-based query understanding. I expect to have 2–3 new publications released in Fall 2025.

In summer 2025, I worked at TikTok Inc. as a Machine Learning Engineer Intern, where I optimized TikTok Shop's recommendation system and improved both offline AUC and online A/B test performance through model and system enhancements.

Prior to joining CMU, I was a Senior Data Scientist at Tencent, leading projects in recommendation and risk modeling. My work contributed to large-scale user protection initiatives and revenue growth, and I was recognized as a top performer with multiple company awards.

I'm expected to graduate in May 2026 and I'm seeking full-time opportunities in MLE/RS in 2026!

🔥 News

  • 2025.10: New publication "Less LLM, More Documents" uploaded to arXiv
  • 2025.08: DeepResearchGym system exceeded 12 million search requests!
  • 2025.05: Started Machine Learning Engineer Internship at TikTok Inc., San Jose
  • 2025.04: Completed DeepResearchGym system, related paper uploaded to arXiv
  • 2024.12: Started working with Prof. Jamie Callan on Information Retrieval-related research
  • 2024.08: Started Master's program at Carnegie Mellon School of Computer Science!
  • 2024.06: Wrapped up the 3-year journey as Senior Data Scientist at Tencent
  • 2024.01: Officially promoted to T9 Senior Data Scientist at Tencent
  • 2023.07: Received "Outstanding Employee" (top 10%) performance evaluation at Tencent

📝 Publications

Less LLM More Documents

Less LLM, More Documents: Searching for Improved RAG

Ning, J.*, Kong, Y.*, Long, Y.*, & Callan, J. (2025)

Manuscript under review at ECIR 2026

Investigating the trade-offs between retriever corpus size and generator complexity for improved RAG performance.

DeepResearchGym

DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research

Coelho, J., Ning, J., He, J., Mao, K., Paladugu, A., Setlur, P., ... & Xiong, C. (2025)

Manuscript under review at ICLR 2026

A unified framework for evaluating deep research with reproducible benchmarks and transparent evaluation metrics.

💼 Career

Research Assistant | Carnegie Mellon University

Dec 2024 – Present | Pittsburgh, PA

  • Pursuing research at the intersection of LLMs and Information Retrieval
  • Supervised by Prof. Jamie Callan (former SIGIR Chair)
  • Primarily researching RAG systems and DeepResearch systems, with strong interests in Agents and query understanding

Machine Learning Engineer Intern | TikTok Inc.

May 2025 – Aug 2025 | San Jose, CA

  • Optimized fine-ranking stage of TikTok Shop's recommendation system
  • Enhanced semantic alignment with +3.75% lift in offline user-level AUC
  • Improved online A/B tests: RPG@7 by +0.94% and RPG@14 by +1.12%

Senior Data Scientist | Tencent

Jul 2021 – Jun 2024 | Shenzhen, China

  • Led data science for Tencent Games' Minors Protection program (730K+ teenagers protected monthly)
  • Developed "Xinyue Mall" recommendation system from scratch (14.98% ARPU increase, ~$2.13M monthly revenue)
  • Recognized as top performer with multiple awards including "Outstanding Employee" (top 10%)

🎓 Education

Carnegie Mellon University

Master of Science in Computer Science (MCDS) | May 2026
Location: Pittsburgh, PA

The Chinese University of Hong Kong

Double Degree of BSc Computer Science and BBA Business Analytics | Jun 2021
Location: Hong Kong