| Zhipu AI (Z.ai) |
GLM-5.2 Open |
Text |
2026-06 |
1M |
$1.40/1M tokens |
$4.40/1M tokens |
753B params; Rolled out to all GLM Coding Plan tiers (Lite/Pro/Max/Team); standalone API + MIT open weights |
| MiniMax |
MiniMax M3 Open |
Multimodal |
2026-06 |
1M |
$0.30/1M tokens |
$1.20/1M tokens |
428B params (23B active); 1-million-token context window; novel attention mechanism called MSA; native multimodality |
| Moonshot AI |
Kimi-K2.7-Code Open |
Code |
2026-06 |
256K |
$0.95/1M tokens |
$4.00/1M tokens |
1T total params / 32B active, 384 experts, ~30% lower reasoning-token usage vs K2.6 |
| StepFun |
Step 3.7 Flash Open |
Multimodal |
2026-06 |
256K |
$0.2/1M tokens |
$1.15/1M tokens |
198B total / 11B active params; 1.8B ViT vision encoder; 3 reasoning levels (low/med/high); ~400 TPS; Apache 2.0; supports NVIDIA NIM on-prem deployment |
| Meituan |
LongCat-2.0 Open |
Text |
2026-06 |
1M |
|
|
1.6T para + MoE; trained on 50,000 domestic Chinese chips; next-gen flagship |
| Alibaba |
Qwen3.7-Max Closed |
Text |
2026-05 |
1M |
$2.50/1M tokens |
$7.50/1M tokens |
Flagship closed-source reasoning model. Ranked #1 on Artificial Analysis Intelligence Index (57/100) out of 218 models at release. 1M token context window. Text-only (no multimodal). Architecture reportedly dual-72B. Extended thinking / chain-of-thought mode. Released May 19, 2026. |
| Baidu |
ERNIE 5.1 Closed |
Text |
2026-05 |
128K |
$0.59/1M tokens |
$2.65/1M tokens |
MoE successor to ERNIE 5.0. ~⅓ total params, ~½ active params, 6% of pre-training cost. AIME 2026: 99.6% with tool use (#2 globally). Arena Search: 1,223 (#4 global, #1 China). Released May 9, 2026. |
| Baichuan AI |
Baichuan-M4 Closed |
Text |
2026-05 |
— |
|
|
Medical AI flagship (May 2026). World #1 on HealthBench, HealthBench Hard & HealthBench Professional. Hallucination rate 3.3% — industry low via factuality-aware RL. Surpasses GPT-5.5, Opus 4.7, DeepSeek-V4-Pro. Powers 百小医 AI family doctor. |
| ModelBest |
MiniCPM-V 4.6 Open |
Multimodal |
2026-05 |
260K |
|
|
1.3B multimodal model (SigLip2 + Qwen3.5-0.8B). 260K token context. Runs on consumer phones (iOS/Android/HarmonyOS). Launched May 2026. |
| ModelBest |
MiniCPM5-1B Open |
Text |
2026-05 |
128K |
|
|
1B language model with 128K token context. SOTA on-device LLM at its size class. Apache 2.0. |
| Zhipu AI (Z.ai) |
GLM-5.1 Open |
Text |
2026-04 |
200K |
$0.98/1M tokens |
$3.08/1M tokens |
754B flagship — top scores on AIME 2026 (95.3) and GPQA-Diamond (86.2); strong agentic and coding ability |
| MiniMax |
MiniMax-M2.7 Open |
Agent |
2026-04 |
1M tokens |
$0.28/1M tokens |
$1.20/1M tokens |
229B MoE; self-evolving agent — 30% perf gain over 100+ rounds; agent teams, complex skills, 24/7 background agents |
| Moonshot AI |
Kimi K2.6 Open |
Multimodal |
2026-04 |
256K |
$0.60/1M tokens |
$2.50/1M tokens |
1T MoE (32B active), 256K ctx; native multimodal agentic — top SWE-Bench (80.2) and AIME 2026 (96.4) |
| DeepSeek |
DeepSeek-V4-Pro Open |
Text |
2026-04 |
1M |
$0.435/1M tokens |
$0.87/1M tokens |
Flagship MoE: 1.6T total / 49B active params, 1M context, FP4+FP8 precision, built-in Think/Non-Think reasoning modes. Codeforces rating 3206. |
| DeepSeek |
DeepSeek-V4-Flash Open |
Text |
2026-04 |
1M |
$0.14/1M tokens |
$0.28/1M tokens |
Efficient MoE: 284B total / 13B active params, 1M context. Same architecture as V4-Pro at far lower compute cost. Codeforces 2816. |
| Baidu |
ERNIE-Image Open |
Image Gen |
2026-04 |
— |
$0.03/image |
|
8B single-stream Diffusion Transformer (DiT). Best text rendering in open source (LongTextBench: 0.9733). Up to 4K output. Apache 2.0. Released Apr 15, 2026. |
| Ant Group |
Ring-2.6-1T Open |
Reasoning |
2026-04 |
262K |
$0.075/1M tokens |
$0.625/1M tokens |
1T total params, ~63B active per token; adaptive reasoning via high / xhigh modes; up to 66K output tokens; open weights (MIT); released May 8 2026; free tier available on OpenRouter |
| Ant Group |
Ling-2.6-1T Open |
Text |
2026-04 |
262K |
$0.075/1M tokens |
$0.625/1M tokens |
1T total params, ~63B active per token; fast thinking approach cuts token cost to ~1/4 of comparable models; open weights (MIT); released Apr 23 2026; free tier available on OpenRouter via Novita AI |
| Ant Group |
Ling-2.6-flash Open |
Text |
2026-04 |
262K |
$0.10/1M tokens |
$0.30/1M tokens |
Efficient sparse MoE: 104B total / 7.4B active, hybrid linear attention. 340 tokens/s on 4× H20. Strong agent and tool-use performance. |
| Tencent |
Hy3-preview Closed |
Text |
2026-04 |
256K |
¥1.2/1M tokens |
¥4.0/1M tokens |
295B MoE (21B active); reasoning, coding, and agentic workloads. Via Tencent Cloud TokenHub. Released Apr 23, 2026. |
| Xiaomi |
MiMo-V2.5-Pro Open |
Text |
2026-04 |
1M |
$0.0036/1M tokens |
$0.087/1M tokens |
1.02T MoE, 42B active; KV-cache reduced ~7×; latest generation |
| Xiaomi |
MiMo-V2-Pro Closed |
Text |
2026-03 |
1M |
$0.0036/1M tokens |
$0.087/1M tokens |
1T+ MoE, 42B active; hybrid attention; 1M token context |
| Unitree Robotics |
UnifoLM-VLA-0 Open |
Embodied |
2026-03 |
N/A |
|
|
Vision-Language-Action model enabling the G1 humanoid to autonomously perform household tasks from natural language commands. Runs onboard the robot. Open-sourced March 2026 — Unitree's first AI model release. |
| Kunlun Tech |
SkyReels V4 Closed |
Video Gen |
2026-03 |
— |
|
|
Audio-visual creation model (Mar 2026). Dual-stream architecture; #1 globally on Text-to-Video (With Audio) and Image-to-Video (With Audio) tracks at release. Generates clips up to 3 minutes. |
| Kunlun Tech |
Mureka V9 Closed |
Audio |
2026-03 |
— |
|
|
Music generation model (Mar 2026). Paragraph-level text control; enhanced mixing quality, vocal expression, and style richness. |
| Kunlun Tech |
Matrix-Game 3.0 Closed |
Text |
2026-03 |
— |
|
|
Physics-simulation interactive world model (Mar 2026). 5B params; 720P @ 40FPS; covers 1,000+ scenarios with Unreal Engine data. Industrial-grade real-time interactivity. |
| ByteDance |
Seed2.0 Pro Closed |
Multimodal |
2026-02 |
272K |
$0.47/1M tokens |
$2.37/1M tokens |
ByteDance's flagship multimodal model. Understands text, image, and video. Ranks #3 globally on LMSYS Vision Arena and #6 on overall text arena. AIME 2025: 98.3, SWE-bench: 76.5%. |
| ByteDance |
Seedream 5.0 Lite Closed |
Image Gen |
2026-02 |
— |
$0.026/image |
|
Unified multimodal image generation with chain-of-thought visual reasoning, real-time web search, and native editing. Supports up to 14 reference images. Up to 4K output at 2048×2048. |
| ByteDance |
Seedance 2.0 Closed |
Video Gen |
2026-02 |
— |
|
|
Unified audio-video joint generation model. Accepts text, image, video, and audio inputs; outputs native 2K video (up to 15s) with synchronized audio. 30% faster than Seedance 1.5 Pro. |
| Alibaba |
Qwen3.5 Open |
Multimodal |
2026-02 |
262K (1M w/ YaRN) |
|
|
Unified vision-language MoE family. Flagship: 35B-A3B (35B total / 3B active) and 397B-A17B. Thinking mode on by default. Supports image, video, text input. 201 languages. Apache 2.0. |
| Alibaba |
Qwen3-Coder-Next Open |
Code |
2026-02 |
256K |
|
|
80B total / 3B active MoE coding agent. Excels at long-horizon agentic tasks, tool use, and IDE integration (Claude Code, Cline, Qwen Code). No thinking mode. |
| Zhipu AI (Z.ai) |
GLM-5 Open |
Text |
2026-02 |
200K |
$1.40/1M tokens |
$4.40/1M tokens |
754B MoE (40B active); 28.5T token pre-training; top-tier SWE-bench score of 77.8 |
| MiniMax |
MiniMax-M2.5 Open |
Agent |
2026-02 |
1M tokens |
$0.30–$1.00/hr |
$0.30–$1.00/hr |
229B; SOTA SWE-Bench (80.2); 80% of MiniMax's own code generated by this model; M2.5-Lightning at 100 tok/s |
| Baidu |
ERNIE 5.0 Closed |
Multimodal |
2026-02 |
128K |
|
|
2.4 trillion parameter unified multimodal (text + image + video + audio) in a single autoregressive framework. LMArena Text: 1,460 (#1 China, #8 global); Vision: 1,226 (#1 China, #8 global). Released Feb 6, 2026. |
| Ant Group |
LingBot-VLA Open |
Embodied |
2026-02 |
|
|
|
Open-source VLA foundation model trained on ~20,000 hours of real-world dual-arm robot data across 9 embodiments |
| Ant Group |
Ring-2.5-1T Open |
Text |
2026-02 |
256K |
|
|
World's first open-source 1T-parameter thinking model. Gold medal level at IMO 2025 (35/42 pts) and CMO 2025 (105/126). 3× throughput for sequences >32K. |
| Ant Group |
Ming-flash-omni-2.0 Open |
Multimodal |
2026-02 |
— |
|
|
Any-to-any omni model: accepts image, text, video, and audio; outputs image, text, and audio. 100B total / 6B active MoE. Supports zero-shot voice cloning and image generation/editing. |
| Ant Group |
LLaDA-2.1-flash Open |
Multimodal |
2026-02 |
— |
|
|
Novel diffusion-based language model — not autoregressive. 103B params. Generates text by iterative token editing rather than left-to-right decoding. 102K monthly downloads. |
| Baichuan AI |
Baichuan-M3 Open |
Text |
2026-02 |
— |
|
|
235B medical model (Feb 2026) built on Qwen3 architecture. Former world #1 on HealthBench. Hallucination rate 3.5%. Outperforms human doctors in diagnostic accuracy. 48GB VRAM with W4 quantization. |
| Moonshot AI |
Kimi K2.5 Open |
Multimodal |
2026-01 |
256K |
$0.40/1M tokens |
$1.90/1M tokens |
1T MoE (32B active); trained on 15T vision+text tokens; thinking & instant modes; agent swarm support |
| DeepSeek |
DeepSeek-OCR-2 Open |
Multimodal |
2026-01 |
— |
|
|
3B visual OCR model with document-to-markdown conversion, layout-aware grounding, and dynamic resolution. 1.66M monthly downloads. |
| Tencent |
HunyuanImage 3.0 Open |
Image Gen |
2026-01 |
— |
|
|
80B MoE (13B active); text-to-image and image-to-image with reasoning. Open weight on HuggingFace. Released Jan 26, 2026. |
| Meituan |
LongCat-Flash-Lite Open |
Text |
2026-01 |
256K |
|
|
68.5B MoE, 2.9–4.5B active; fast and efficient |
| AIsphere |
PixVerse R1 Closed |
Video Gen |
2026-01 |
N/A |
|
|
Real-time world model; 1080p, <15s latency, physics-aware, infinite temporal continuity |
| SenseTime |
SenseNova V6.5 Omni Closed |
Multimodal |
2026-01 |
— |
|
|
Real-time multimodal streaming model. Powers SenseChat's live audio/video interaction. Successor to V6 Pro. |
| DeepSeek |
DeepSeek-V3.2 Open |
Text |
2025-12 |
128K |
$0.252/1M tokens |
$0.378/1M tokens |
685B MoE with DeepSeek Sparse Attention (DSA). V3.2-Speciale variant won gold at 2025 IMO and IOI. 4.16M monthly downloads on HuggingFace. |
| Kuaishou |
Kling I2V 2.0 Closed |
Multimodal |
2025-12 |
N/A |
$0.16/video (5s) |
|
Image-to-video with industry-leading subject preservation |
| Xiaomi |
MiMo-V2-Flash Open |
Text |
2025-12 |
N/A |
$0.01/1M tokens |
$0.30/1M tokens |
309B MoE, 15B active; trained on 27T tokens; open-source release |
| AIsphere |
PixVerse V5.5 Closed |
Video Gen |
2025-12 |
N/A |
|
|
Text/image-to-video; HD output, multiple aspect ratios, character consistency |
| Tencent |
Hunyuan3D 3.0 Closed |
3D Gen |
2025-11 |
— |
|
|
3D asset generation from text, image, or sketch input |
| iFlyTek |
Spark X1.5 Closed |
Text |
2025-11 |
— |
|
|
MoE reasoning model (29.3B total / 3B active parameters). Supports 130+ languages. Runs on a single Huawei Ascend server. Launched Nov 2025. |
| MiniMax |
MiniMax-M2 Open |
Agent |
2025-10 |
1M tokens |
|
|
230B MoE (10B active); interleaved thinking; open-source under Modified MIT; free API available |
| Infinigence AI |
Megrez2-3x7B-A3B Open |
Text |
2025-09 |
— |
|
|
MoE edge model: 3×7B experts, 3B active parameters. Open-weight, Apache 2.0. |
| Alibaba |
Qwen-Image Open |
Image Gen |
2025-08 |
— |
|
|
Image generation and editing foundation model. Exceptional Chinese + English text rendering. Supports style transfer, object insertion/removal, layered editing, and depth/edge estimation. Released Aug 2025. |
| ModelBest |
MiniCPM-V 4.5 Open |
Multimodal |
2025-08 |
— |
|
|
8B multimodal model (Qwen3-8B + SigLIP2). Apache 2.0. |
| Moonshot AI |
Kimi k2 Open |
Text |
2025-07 |
128K |
¥0.12/1K tokens |
¥0.12/1K tokens |
Agentic model with tool use, web browsing, and code execution |
| MiniMax |
MiniMax-M1 Open |
Text |
2025-06 |
1M tokens |
$0.80/1M tokens |
$2.20/1M tokens |
First open-weight large-scale hybrid-attention reasoning model; test-time compute scaling |
| MiniMax |
Hailuo-02 Open |
Video Gen |
2025-06 |
N/A |
$0.06/video (5s) |
|
Physics-aware video generation; top-3 globally on VBench |
| Shengshu Technology |
Vidu Q3 Closed |
Video Gen |
2025-06 |
— |
~$0.07/sec |
|
World's first storytelling-focused video model. Up to 16 sec, 1080p 24fps, native audio sync. Multilingual. Pro and Turbo variants. MCP integration. |
| DeepSeek |
DeepSeek-R1-0528 Open |
Text |
2025-05 |
128K |
$0.50/1M tokens |
$2.15/1M tokens |
Latest R1 update: 685B, adds system prompt support, deeper reasoning (23K avg tokens on AIME vs 12K prior). Distilled 8B version achieves 86% on AIME 2024. |
| Kuaishou |
Kling 2.0 Closed |
Video Gen |
2025-04 |
N/A |
$0.16/video (5s) |
|
4K-capable video generation; advanced camera controls and scene coherence |
| Kuaishou |
Kolors 2.0 Open |
Image Gen |
2025-04 |
N/A |
$0.004/image |
|
Upgraded photorealistic image model; open-source available |
| SenseTime |
SenseNova V6 Pro Closed |
Multimodal |
2025-04 |
256K |
¥2.8/1M tokens |
¥8.4/1M tokens |
620B MoE hybrid. Real-time audio/video streaming. Ranked #1 in China in multimodal reasoning at launch. Lowest reasoning cost in industry at launch (Apr 2025). |
| Kunlun Tech |
Skywork-OR1-32B Open |
Text |
2025-04 |
— |
|
|
Open-source math and coding reasoning model (Apr 2025). 32B params; rivals DeepSeek-R1 on competitive programming benchmarks. 7B variant also available. |
| Baidu |
ERNIE X1 Closed |
Text |
2025-03 |
128K |
$0.28/1M tokens |
$1.10/1M tokens |
Baidu's first dedicated reasoning model with extended chain-of-thought |
| Baidu |
ERNIE 4.5 Closed |
Multimodal |
2025-03 |
128K |
$0.55/1M tokens |
$2.20/1M tokens |
Improved multimodal and reasoning; best ERNIE for Chinese enterprise tasks. The flagship ERNIE 4.5 model is closed-source, though several variants in the ERNIE 4.5 family are openly available on HuggingFace. |
| StepFun |
Step-3 Closed |
Text |
2025-01 |
1M |
$0.57/1M tokens |
$1.42/1M tokens |
StepFun's 2025 flagship; 1M token context and native reasoning mode |
| Baichuan AI |
Baichuan-M2 Open |
Text |
2025-01 |
— |
|
|
32B medical reasoning model (2025) built on Qwen2.5-32B with innovative Large Verifier System for real-world clinical reasoning. |
| ModelBest |
MiniCPM-o 2.6 Open |
Multimodal |
2025-01 |
— |
|
|
8B any-to-any model (text + vision + speech). Surpasses GPT-4o and Gemini 1.5 Pro on single-image understanding benchmarks. Launched Jan 2025. |
| Kuaishou |
Kling 1.6 Pro Closed |
Video Gen |
2024-12 |
N/A |
$0.14/video (5s) |
|
Previous flagship; supports 1080p, 5-10s clips, camera controls |
| Infinigence AI |
Megrez-3B-Omni Open |
Multimodal |
2024-12 |
— |
|
|
3B omni model processing text, vision, and audio. Outperforms LLaVA-NeXT-Yi-34B on vision benchmarks. Optimised for on-device and edge deployment. |
| Kunlun Tech |
Skywork-o1 Open |
Text |
2024-11 |
— |
|
|
Reasoning model (Nov 2024) — among China's first o1-style models with chain-of-thought Chinese logical reasoning. Open-source 8B variant (Llama 3.1 base) plus proprietary advanced version. |
| Baichuan AI |
Baichuan4-Turbo Closed |
Text |
2024-10 |
192K |
|
|
Enterprise flagship general LLM (late 2024). 10%+ usability gain vs prior generation; priced at ~80% of GPT-4o. Supports multimodal input. |
| Baichuan AI |
Baichuan4-Air Closed |
Text |
2024-10 |
192K |
|
|
MoE variant (PRI architecture) of Baichuan4. High performance at low cost for API deployments. |
| 01.AI |
Yi-Lightning Closed |
Text |
2024-10 |
16K |
$0.14/1M tokens |
$0.14/1M tokens |
MoE flagship API model (Oct 2024). Ranked #6 on Chatbot Arena at launch — joint 3rd among LLM companies. 40% faster inference than prior Yi models. Final model before 01.AI halted pre-training in early 2025. |
| StepFun |
Step-1V Closed |
Multimodal |
2024-09 |
200K |
¥0.034/1K tokens |
¥0.1/1K tokens |
Vision-language model supporting image input |
| 01.AI |
Yi-Coder Open |
Code |
2024-09 |
128K |
|
|
Code model (Sep 2024); 1.5B and 9B variants; supports 52 programming languages; 128K context window. |
| iFlyTek |
Spark 4.0 Closed |
Text |
2024-08 |
— |
|
|
Flagship LLM (Aug 2024). Claims comparable performance to GPT-4 Turbo on Chinese language benchmarks. |
| StepFun |
Step-2 Closed |
Text |
2024-07 |
256K |
¥0.038/1K tokens |
¥0.12/1K tokens |
Very long context window; strong document understanding |
| StepFun |
Step-1X-Image Closed |
Image Gen |
2024-07 |
N/A |
¥0.04/image |
|
High-resolution image generation model |
| 01.AI |
Yi-1.5 Open |
Text |
2024-05 |
— |
|
|
Open-source series (May 2024); 6B–34B variants. Improved coding, math, reasoning, and instruction-following over original Yi. |
| 01.AI |
Yi-VL-34B Open |
Multimodal |
2024-01 |
— |
|
|
34B vision-language model (early 2024). Open-weight multimodal extension of Yi-34B. |
| 01.AI |
Yi-34B Open |
Text |
2023-11 |
200K |
|
|
Founding open-source model (Nov 2023). Topped HuggingFace Open LLM Leaderboard and C-Eval at launch. 200K-token context variant (Yi-34B-200K) also available. |
| Kunlun Tech |
Skywork-13B Open |
Text |
2023-10 |
— |
|
|
Founding open-source bilingual LLM (Oct 2023). 13B params; pretrained on 3.2T tokens. Led same-scale models on CEVAL, CMMLU, and MMLU at launch. |
| Baichuan AI |
Baichuan2 Open |
Text |
2023-09 |
4K |
|
|
Open-source bilingual LLM series (Sep 2023); 7B and 13B variants; trained on 2.6T tokens. Available on Hugging Face under permissive license. |
| Baidu |
ERNIE-Speed Closed |
Text |
— |
128K |
|
|
Free tier for prototyping and light production |
| Meituan |
LongCat-Image Open |
Image Gen |
— |
N/A |
|
|
Image generation model, ~6B params; data-quality focused; released Dec 2025 |
| Meituan |
LongCat-Video Open |
Video Gen |
— |
N/A |
|
|
Text-to-video generation model; released Oct 2025 |
| Meituan |
LongCat-Flash-Omni Open |
Multimodal |
— |
256K |
|
|
Text + vision + audio multimodal |
| Meituan |
LongCat-Flash-Thinking Open |
Reasoning |
— |
256K |
|
|
Chain-of-thought reasoning variant of LongCat-Flash |
| Meituan |
LongCat-Flash-Chat Open |
Text |
— |
128K |
|
|
560B MoE, 27B active; open-source; 500K free tokens/day |
| WeRide |
WeRide GENESIS Closed |
Embodied |
— |
— |
|
|
Generative Engineered Neural Environment for Simulated Intelligence in Self-driving. WeRide's proprietary general-purpose simulation platform combining physical AI with generative AI. Rapidly builds photorealistic simulated cities in minutes, generates diverse edge-case scenarios from billions of km of real-world data, and models realistic pedestrian and driver behavior — all at centimeter-level fidelity. Supports L2++ through L4 AV development and validation via four modules: AI Scenarios, AI Agents, AI Metrics, and AI Diagnosis. |
| Pony.ai |
PonyWorld 2.0 Closed |
Embodied |
— |
— |
|
|
Second-generation world model underpinning the Virtual Driver L4 autonomous driving platform. Introduces an Intention layer — a structured representation of decision-making that enables the system to evaluate its own driving decisions, identify accuracy gaps across scenarios, and direct targeted data collection rather than broad undirected improvement. Direct sensor-to-action architecture with no language models in the inference pipeline; runs on 1016 TOPS across three NVIDIA DRIVE Orin-X SoCs with redundant failover. |
| Unisound |
UniGPT (Shanhai 山海) Closed |
Text |
— |
— |
|
|
60B+ parameter general large model (v5.0); underpins all Unisound vertical products. Medical, enterprise, and consumer deployments. |
| Unisound |
U2-ASR 2.5 Closed |
Audio |
— |
— |
|
|
First LLM-based semantic ASR model for Chinese. Covers 100+ dialects across 7 dialect systems. >90% accuracy. Available via Token Hub API. |
| Unisound |
U2-TTS / U2-TTS-Clone Closed |
Audio |
— |
— |
|
|
Text-to-speech and voice cloning with full-duplex millisecond response. Available via Token Hub API. |
| Unisound |
U1-OCR Closed |
Multimodal |
— |
— |
|
|
Industrial-grade document intelligence model for OCR and document understanding. Launched February 2026. |