🧠 AI Model Performance

Transparent insights into how our AI models perform. See accuracy, calibration, and reliability metrics.

🏆

Best Accuracy

GEMINI

69.7% correct

🎯

Best Calibrated

GEMINI

8.0% error

📊

Total Predictions

230

Across all models

📈

Avg Brier Score

0.223

Lower is better

⚖️ Model Comparison

Gemini

89 predictions

Accuracy 69.7%

Brier Score 0.210

Calibration Error 8.0%

Correct Calls 62/89

Calibration by Confidence

0-49% n=5

50-59% n=18

60-69% n=32

70-79% n=25

80-89% n=8

90-100% n=1

■ Predicted ■ Actual

Claude

76 predictions

Accuracy 65.8%

Brier Score 0.240

Calibration Error 11.0%

Correct Calls 50/76

Calibration by Confidence

0-49% n=8

50-59% n=22

60-69% n=28

70-79% n=14

80-89% n=4

■ Predicted ■ Actual

Openai

65 predictions

Accuracy 67.7%

Brier Score 0.220

Calibration Error 9.0%

Correct Calls 44/65

Calibration by Confidence

0-49% n=4

50-59% n=15

60-69% n=25

70-79% n=17

80-89% n=4

■ Predicted ■ Actual

📚 Understanding the Metrics

🎯 Accuracy

The percentage of predictions that turned out correct. Higher is better.

📊 Brier Score

Measures the accuracy of probabilistic predictions. Range: 0 (perfect) to 1 (worst). Lower is better.

⚖️ Calibration Error

How closely the predicted confidence matches actual accuracy. A well-calibrated model's 70% predictions are correct 70% of the time.

📈 Calibration Chart

Visual comparison of predicted confidence vs actual outcomes. Bars should align when well-calibrated.