不检测渠道接口规范, 只检测渠道任务能力, 通过多种渠道统计渠道差异
测试次数
757
测试题目
9
总计花费
$358.99
右下角区域为黄金性价比模型。气泡半径越大代表生成速度越快。
| 站点 | 渠道 | 模型 | 通过率 | 平均花费 (USD) | Token TPS | 测试次数 |
|---|---|---|---|---|---|---|
| Packy Code | OFFICIAL | claude-opus-4-8 | 100% | $0.9868 | 25.6 | 29 |
| Micu | OFFICIAL | claude-opus-4-8 | 96.6% | $0.7391 | 28.3 | 29 |
| Right Code | OFFICIAL | claude-opus-4-8 | 96.6% | $0.9658 | 24.3 | 29 |
| CCTQ | OFFICIAL | claude-opus-4-8 | 93.1% | $0.8481 | 25.6 | 29 |
| SSSAI Code | OFFICIAL | claude-opus-4-8 | 93.1% | $0.4658 | 25.4 | 29 |
| Yunwu | OFFICIAL | claude-opus-4-8 | 89.7% | $1.529 | 23.7 | 29 |
| Somebody | AWSQ | claude-opus-4-8 | 89.7% | $0.2911 | 19.6 | 29 |
| 78 Code | OFFICIAL | claude-opus-4-8 | 89.7% | $0.6379 | 23.8 | 29 |
| Cubence | OFFICIAL | claude-opus-4-8 | 86.2% | $0.8658 | 26.8 | 29 |
| Duck Code | OFFICIAL | claude-opus-4-8 | 86.2% | $0.61 | 24.1 | 29 |
| TimiCC | OFFICIAL | claude-opus-4-8 | 86.2% | $0.6191 | 24.8 | 29 |
| Neko Code | OFFICIAL | gpt-5.5 | 86.2% | $0.0716 | 22.1 | 29 |
| Neko Code | OFFICIAL | claude-opus-4-8 | 82.8% | $0.6486 | 20.6 | 29 |
| Yes Code | OFFICIAL | gpt-5.5 | 82.8% | $0.2203 | 32.1 | 29 |
| Yes Code | OFFICIAL | claude-opus-4-8 | 82.8% | $0.7998 | 19.8 | 29 |
| TimiCC | OFFICIAL | gpt-5.5 | 82.8% | $0.0885 | 23.9 | 29 |
| Micu | OFFICIAL | gpt-5.5 | 82.8% | $0.0732 | 23.7 | 29 |
| Cubence | OFFICIAL | gpt-5.5 | 82.8% | $0.1226 | 23.3 | 29 |
| SSSAI Code | OFFICIAL | gpt-5.5 | 79.3% | $0.1886 | 28.5 | 29 |
| CCTQ | OFFICIAL | gpt-5.5 | 79.3% | $0.0323 | 24.5 | 29 |
| Right Code | OFFICIAL | gpt-5.5 | 78.6% | $0.0426 | 22.7 | 28 |
| Packy Code | OFFICIAL | gpt-5.5 | 75.9% | $0.1084 | 22.6 | 29 |
| IKun Code | OFFICIAL | claude-opus-4-8 | 75.9% | $1.1783 | 24.1 | 29 |
| Duck Code | OFFICIAL | gpt-5.5 | 75.9% | $0.1544 | 19.4 | 29 |
| IKun Code | OFFICIAL | gpt-5.5 | 72.4% | $0.0471 | 22.8 | 29 |
| 78 Code | OFFICIAL | gpt-5.5 | 72.4% | $0.0407 | 21.8 | 29 |
| Codex For | OFFICIAL | gpt-5.5 | 50% | $0.0334 | 27.5 | 4 |
以测试题为主键,展示在 terminal-bench / aider-polyglot 等真实编程任务下的跑分,而非接口连通性检测。
历史通过率 = 通过 trial 数 ÷ 总 trial 数 × 100%;水平线 = 所有 config 历史通过率的算术平均。
统计每个渠道(config)的全部历史 trial;同 task/config 历史 trial 数少于 3 次标记「样本不足」,避免小样本误导。指标含通过率、平均 Token、平均花费、Token TPS。