🧠 AI Model Performance

Transparent insights into how our AI models perform. See accuracy, calibration, and reliability metrics.

🏆

Best Accuracy

GEMINI
69.7% correct
🎯

Best Calibrated

GEMINI
8.0% error
📊

Total Predictions

230
Across all models
📈

Avg Brier Score

0.223
Lower is better

⚖️ Model Comparison

G

Gemini

89 predictions
0%
Accuracy 69.7%
Brier Score 0.210
Calibration Error 8.0%
Correct Calls 62/89

Calibration by Confidence

0-49% n=5
50-59% n=18
60-69% n=32
70-79% n=25
80-89% n=8
90-100% n=1
■ Predicted ■ Actual
C

Claude

76 predictions
0%
Accuracy 65.8%
Brier Score 0.240
Calibration Error 11.0%
Correct Calls 50/76

Calibration by Confidence

0-49% n=8
50-59% n=22
60-69% n=28
70-79% n=14
80-89% n=4
■ Predicted ■ Actual
O

Openai

65 predictions
0%
Accuracy 67.7%
Brier Score 0.220
Calibration Error 9.0%
Correct Calls 44/65

Calibration by Confidence

0-49% n=4
50-59% n=15
60-69% n=25
70-79% n=17
80-89% n=4
■ Predicted ■ Actual

📚 Understanding the Metrics

🎯 Accuracy

The percentage of predictions that turned out correct. Higher is better.

📊 Brier Score

Measures the accuracy of probabilistic predictions. Range: 0 (perfect) to 1 (worst). Lower is better.

⚖️ Calibration Error

How closely the predicted confidence matches actual accuracy. A well-calibrated model's 70% predictions are correct 70% of the time.

📈 Calibration Chart

Visual comparison of predicted confidence vs actual outcomes. Bars should align when well-calibrated.