Projects

Embodied Reasoner (With OSPP's funding, 150+ Github Stars)

Main Contributor

Embodied-Reasoner (ER.) is a multimodal model designed for deep reasoning & long-horizon interaction. In OSPP, similar to GSoC, AGIROS Community selected me as the contributor in charge of ER. from all the applicants. I've committed to testing ER. on Alfred and contributed to resolving two key bottlenecks — ambiguity in identical object instances & imprecise targeting of large objects, further improving spatial accuracy and interaction robustness.

The Application of Reinforcement Learning for Agents in Automated Bidding Scenarios

Leader

This project explores strategic bidding in large-scale ad auctions using reinforcement learning. In a simplified simulator, we designed multiple agent types—truthful, conservative, aggressive, and adaptive—under Generalized Second-Price rules with budget and value uncertainties. Our main goal was to train RL-based bidding agents to maximize cumulative profit and ROI in uncertain and competitive environments. The project also integrates advanced reward shaping and lays the foundation for transferring to large-scale real-world auction simulators like the NeurIPS 2024 Auto-Bidding environment. As the team leader, I designed the full simulation framework and led the RL integration, analysis, and visualizations.

View on GitHub

RAGEN & VAGEN: Training Agents by Reinforcing Reasoning (With 2.2k+ Github Stars)

Contributor

This twin of projects empower agents with RL to operate effectively in interactive and stochastic environments by handling multi-turn interactions and environmental uncertainty. I contributed to developing more environments and mask functions to compute the loss only for the parts generated by the model, which actually made training more stable.

View on GitHub

WeaveWave: Towards Multimodal Music Generation

Main Contributor

WeaveWave explores multimodal music generation, aiming to create music from diverse inputs. It bridged existing MLLMs and Text-to-Music systems, proposing end-to-end architectures, and developing a unified generation framework with a custom training pipeline.

View on GitHub