Projects
Embodied Reasoner (With OSPP's funding, 150+ Github Stars)
Main Contributor
Embodied-Reasoner (ER.) is a multimodal model designed for deep reasoning & long-horizon interaction. In OSPP, similar to GSoC, AGIROS Community selected me as the contributor in charge of ER. from all the applicants. I've committed to testing ER. on Alfred and contributed to resolving two key bottlenecks — ambiguity in identical object instances & imprecise targeting of large objects, further improving spatial accuracy and interaction robustness.
The Application of Reinforcement Learning for Agents in Automated Bidding Scenarios
Leader
This project explores strategic bidding in large-scale ad auctions using reinforcement learning. In a simplified simulator, we designed multiple agent types—truthful, conservative, aggressive, and adaptive—under Generalized Second-Price rules with budget and value uncertainties. Our main goal was to train RL-based bidding agents to maximize cumulative profit and ROI in uncertain and competitive environments. The project also integrates advanced reward shaping and lays the foundation for transferring to large-scale real-world auction simulators like the NeurIPS 2024 Auto-Bidding environment. As the team leader, I designed the full simulation framework and led the RL integration, analysis, and visualizations.
View on GitHubRAGEN & VAGEN: Training Agents by Reinforcing Reasoning (With 2.2k+ Github Stars)
Contributor
This twin of projects empower agents with RL to operate effectively in interactive and stochastic environments by handling multi-turn interactions and environmental uncertainty. I contributed to developing more environments and mask functions to compute the loss only for the parts generated by the model, which actually made training more stable.
View on GitHubWeaveWave: Towards Multimodal Music Generation
Main Contributor
WeaveWave explores multimodal music generation, aiming to create music from diverse inputs. It bridged existing MLLMs and Text-to-Music systems, proposing end-to-end architectures, and developing a unified generation framework with a custom training pipeline.
View on GitHub