Essays
Research essays and explorations on AI agents, simulations, and code generation.
Metaphi Simhub
Large language models have demonstrated superhuman capabilities in discrete, well-defined coding tasks, but their progression into truly agentic, collaborative software engineering partners is hampered by a fundamental limitation in training and evaluation. We introduce Metaphi Simhub, a platform designed to solve this challenge through interactive simulation environments.
Read MoreFrom Static Benchmarks to Dynamic Worlds
The current paradigm of LLM evaluation suffers from a complete lack of interactivity. There is no "user" in the evaluation loop. We explore how principles from autonomous vehicle simulation can transform code generation agent training.
Read More