10x Bench Results
See how LLMs perform at coding the Przeprogramowani.pl website in Astro + React + Tailwind + Cloudflare stack.
Generated: 2/26/2026, 3:11:30 PM • Total attempts: 74
Model Family Rankings
GPT-5.3-Codex
10 attemptsvia Codex Desktop (High Effort)
Cost: $1.75 / $14
Average: 8.5/10.0
GPT-5.3-Codex
10 attemptsvia Codex Desktop (High Effort)
Cost: $1.75 / $14
Average
8.5/10.0
Claude Opus 4.6
10 attemptsvia Claude Code (High Effort)
Cost: $5 / $25
Average: 7.5/10.0
Claude Opus 4.6
10 attemptsvia Claude Code (High Effort)
Cost: $5 / $25
Average
7.5/10.0
Claude Sonnet 4.6
5 attemptsvia Claude Code (High Effort)
Cost: $3 / $15
Average: 7.1/10.0
Claude Sonnet 4.6
5 attemptsvia Claude Code (High Effort)
Cost: $3 / $15
Average
7.1/10.0
Minimax M2.5
5 attemptsvia OpenCode
Cost: $0.3 / $2.4
Average: 6.9/10.0
Minimax M2.5
5 attemptsvia OpenCode
Cost: $0.3 / $2.4
Average
6.9/10.0
GLM-5
5 attemptsvia OpenCode
Cost: $0.3 / $2.55
Average: 6.8/10.0
GLM-5
5 attemptsvia OpenCode
Cost: $0.3 / $2.55
Average
6.8/10.0
Gemini 3.1 Pro
5 attemptsvia Cursor
Cost: $2 / $12
Average: 6.7/10.0
Gemini 3.1 Pro
5 attemptsvia Cursor
Cost: $2 / $12
Average
6.7/10.0
Kimi K2.5
5 attemptsvia OpenCode
Cost: $0.6 / $3
Average: 6.3/10.0
Kimi K2.5
5 attemptsvia OpenCode
Cost: $0.6 / $3
Average
6.3/10.0
Grok Code Fast 1
5 attemptsvia OpenCode
Cost: $0.2 / $1.5
Average: 5.9/10.0
Grok Code Fast 1
5 attemptsvia OpenCode
Cost: $0.2 / $1.5
Average
5.9/10.0
Qwen 3 Max
3 attemptsvia OpenCode
Cost: $1.2 / $6
Average: 4.5/10.0
Qwen 3 Max
3 attemptsvia OpenCode
Cost: $1.2 / $6
Average
4.5/10.0
Devstral 2
3 attemptsvia OpenCode
Cost: $0.4 / $2
Average: 1.7/10.0
Devstral 2
3 attemptsvia OpenCode
Cost: $0.4 / $2
Average
1.7/10.0
Detailed Comparison
Click on any score to reveal the detailed scoring explanation for that criterion.
| Criterion | GPT-5.3-Codex Attempt 1 | GPT-5.3-Codex Attempt 2 | GPT-5.3-Codex Attempt 3 | GPT-5.3-Codex Attempt 4 | GPT-5.3-Codex Attempt 5 | GPT-5.3-Codex Attempt 6 | GPT-5.3-Codex Attempt 7 | GPT-5.3-Codex Attempt 8 | GPT-5.3-Codex Attempt 9 | GPT-5.3-Codex Attempt 10 | Claude Opus 4.6 Attempt 1 | Claude Opus 4.6 Attempt 2 | Claude Opus 4.6 Attempt 3 | Claude Opus 4.6 Attempt 4 | Claude Opus 4.6 Attempt 5 | Claude Opus 4.6 Attempt 6 | Claude Opus 4.6 Attempt 7 | Claude Opus 4.6 Attempt 8 | Claude Opus 4.6 Attempt 9 | Claude Opus 4.6 Attempt 10 | Claude Sonnet 4.6 Attempt 1 | Claude Sonnet 4.6 Attempt 2 | Claude Sonnet 4.6 Attempt 3 | Claude Sonnet 4.6 Attempt 4 | Claude Sonnet 4.6 Attempt 5 | Minimax M2.5 Attempt 1 | Minimax M2.5 Attempt 2 | Minimax M2.5 Attempt 3 | Minimax M2.5 Attempt 4 | Minimax M2.5 Attempt 5 | GLM-5 Attempt 1 | GLM-5 Attempt 2 | GLM-5 Attempt 3 | GLM-5 Attempt 4 | GLM-5 Attempt 5 | Gemini 3.1 Pro Attempt 1 | Gemini 3.1 Pro Attempt 2 | Gemini 3.1 Pro Attempt 3 | Gemini 3.1 Pro Attempt 4 | Gemini 3.1 Pro Attempt 5 | Kimi K2.5 Attempt 1 | Kimi K2.5 Attempt 2 | Kimi K2.5 Attempt 3 | Kimi K2.5 Attempt 4 | Kimi K2.5 Attempt 5 | Grok Code Fast 1 Attempt 1 | Grok Code Fast 1 Attempt 2 | Grok Code Fast 1 Attempt 3 | Grok Code Fast 1 Attempt 4 | Grok Code Fast 1 Attempt 5 | Qwen 3 Max Attempt 1 | Qwen 3 Max Attempt 2 | Qwen 3 Max Attempt 3 | Devstral 2 Attempt 1 | Devstral 2 Attempt 2 | Devstral 2 Attempt 3 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Local build | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 1 |
Manual testing | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 1 |
Tech stack | 0.5 | 1 | 1 | 0.5 | 1 | 1 | 1 | 1 | 1 | 1 | 0.5 | 1 | 1 | 1 | 0.5 | 0.5 | 0 | 1 | 0.5 | 0.5 | 0.5 | 0 | 0 | 0 | 0 | 0.5 | 1 | 1 | 1 | 1 | 0.5 | 0 | 0.5 | 0.5 | 0.5 | 1 | 1 | 0 | 1 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0 | 0 | 1 | 0 | 0.5 | 0 | 0.5 | 0 | 0 | 0 | 0 |
O nas page | 0.5 | 0.5 | 1 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 1 | 0.5 | 0.5 | 0.5 | 0 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0 | 0 | 0 | 0.5 |
Podcast page | 1 | 0.5 | 1 | 0.5 | 1 | 0.5 | 0.5 | 0.5 | 1 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 1 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0 | 0 | 0 | 0.5 |
YouTube page | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 1 | 0.5 | 1 | 0.5 | 1 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 1 | 0.5 | 1 | 0.5 | 0.5 | 1 | 1 | 1 | 0 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 1 | 1 | 0.5 | 0.5 | 1 | 1 | 0.5 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0.5 |
Kursy section | 0.5 | 1 | 1 | 0.5 | 1 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 1 | 0.5 | 0.5 | 1 | 0.5 | 0.5 | 1 | 0.5 | 0.5 | 0.5 | 1 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 1 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0 | 0 | 0 | 0.5 |
Consistent UI | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 1 |
Responsive design | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0.5 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0.5 | 1 | 1 | 1 | 1 | 1 | 1 | 0.5 | 1 | 0.5 | 0.5 | 0 | 0.5 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 0.5 | 0.5 | 0 | 0.5 | 0.5 | 0 | 0 | 0 | 0 |
SEO Tags | 1 | 0.5 | 0.5 | 1 | 0.5 | 1 | 1 | 1 | 0.5 | 0.5 | 0.5 | 1 | 1 | 1 | 1 | 1 | 0.5 | 1 | 1 | 1 | 0.5 | 1 | 1 | 1 | 1 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 1 | 0.5 | 0.5 | 0.5 | 1 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 1 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0 | 0 | 0 | 0 |
Penalty | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | -1 | -1 | N/A | N/A | -1 | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A |
Task completion time | 9min 19s | 9min 24s | 9min 9s | 8min 16s | 9min 40s | 8min 0s | 8min 20s | 9min 25s | 8min 36s | 8min 19s | 9min 36s | 10min 20s | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | 7min 52s | 6min 48s | 5min 52s | 7min 1s | 6min 15s | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | 16min 15s | 16min 21s | 8min 18s | 20min 39s | 15min 36s | N/A | N/A | N/A | N/A | N/A | N/A | 8min 23s | 5min 44s | 9min 45s | 3min 27s | 2min 20s |
Test run | 9.02.2026 16:40 | 9.02.2026 16:40 | 9.02.2026 16:40 | 9.02.2026 22:58 | 9.02.2026 22:58 | 11.02.2026 21:41 | 11.02.2026 21:45 | 11.02.2026 21:46 | 11.02.2026 21:48 | 11.02.2026 21:50 | 9.02.2026 16:40 | 9.02.2026 16:40 | 9.02.2026 16:40 | 9.02.2026 22:45 | 9.02.2026 23:05 | 11.02.2026 21:28 | 11.02.2026 21:34 | 11.02.2026 21:32 | 11.02.2026 21:38 | 11.02.2026 21:40 | 17.02.2026 21:42 | 17.02.2026 21:44 | 17.02.2026 21:49 | 17.02.2026 21:50 | 17.02.2026 21:55 | 12.02.2026 19:34 | 12.02.2026 19:40 | 12.02.2026 19:39 | 12.02.2026 19:42 | 12.02.2026 19:45 | N/A | 16.02.2026 07:36 | 16.02.2026 08:36 | 16.02.2026 12:32 | 16.02.2026 09:05 | 26.02.2026 14:38 | 26.02.2026 14:41 | 26.02.2026 14:47 | 26.02.2026 14:53 | 26.02.2026 15:20 | 9.02.2026 19:10 | 9.02.2026 19:10 | 9.02.2026 19:10 | 9.02.2026 23:37 | 9.02.2026 23:37 | 12.02.2026 20:00 | 12.02.2026 19:55 | N/A | 12.02.2026 20:05 | 12.02.2026 20:20 | 9.02.2026 19:40 | 9.02.2026 19:40 | 9.02.2026 19:40 | 9.02.2026 19:35 | 9.02.2026 19:35 | 9.02.2026 19:35 |
Click on score cells to view evaluation notes









