
🏆 Foundation Models
HunyuanVideo-Foley Tencent has released HunyuanVideo-Foley, an AI-powered sound design tool for video creators. It generates professional-grade sound effects that synchronize precisely with video content, even in complex scenes. Powered by multimodal semantic balancing, the system intelligently analyzes both visual and textual inputs to produce personalized and context-aware audio. Potential applications include short-form video creation, filmmaking, advertising, and game development.
📹 Videos: HunyuanVideo-Foley video | HunyuanVideo-Foley video
Marvis TTS Introduces an advanced conversational speech model designed for real-time voice cloning and streaming text-to-speech synthesis. The system runs efficiently on consumer devices such as Apple Silicon, requiring only 10 seconds of audio to clone a voice. With intelligent text processing and streaming audio generation, Marvis delivers natural, multilingual speech and plans to expand further in language coverage.
📹 Videos: Marvis TTS video | Marvis TTS video
🛠️ Frameworks & Essential Tools
LightThinker Proposes a novel method to enhance the reasoning efficiency of large language models (LLMs). By dynamically compressing intermediate reasoning steps into concise representations, it reduces the number of tokens stored in the context window. Inspired by human cognition, LightThinker employs data construction, hidden-state mapping, and specialized attention masks to optimize reasoning chains.
📹 Videos: LightThinker video | LightThinker video
Gonzo Is a Go-based terminal UI tool focused on real-time log analysis. It supports log ingestion from stdin, files, or networks, with features including automatic format detection, severity tracking, interactive dashboards, and AI-driven anomaly detection. Gonzo streamlines monitoring and troubleshooting by making large-scale log data more manageable and actionable.
📹 Video: Gonzo video
🤖 Agent Development
ELL-StuLife (Self-Evolving Agent via Experience-Driven Lifelong Learning) provides a framework for building agents that continuously grow through real-world interactions. Unlike static continual learning, ELL emphasizes learning from experience by combining exploration, long-term memory, skill acquisition, and knowledge internalization. This enables agents to generate rich experiential data, refine their abilities iteratively, and evolve over time.
📹 Videos: ELL-StuLife video | ELL-StuLife video
📊 Data & Instruction
We-Math 2.0 We-Math 2.0 is a unified system aimed at improving the mathematical reasoning of multimodal large language models (MLLMs). It integrates a structured math knowledge base, model-centric data space modeling, and reinforcement learning training paradigms. The result is stronger reasoning performance across a wide range of mathematical concepts and difficulty levels.