Symmetry Augmentation Experiment

Question

Does Blue/Gold symmetry augmentation provide structural value beyond simply having more training data?

Background

Killer Queen has perfect Blue/Gold symmetry: any game state with Blue holding resources X and Gold holding resources Y is strategically equivalent (from the opposite perspective) to Blue having Y and Gold having X. The symmetry.swap_teams(X, y) function exploits this by permuting the 52-feature vector (swapping blue[0:20] with gold[20:40], negating maiden control and snail direction) and flipping the label.

This creates "free" training data, but the question is whether it helps the model learn the invariance, or just acts like more rows.

Design

Three conditions, all trained with LightGBM (200 leaves, 200 trees):

Condition	Construction	Rows
X	100k random training rows	100,000
X + sym(X)	same 100k + their symmetry-swapped copies	200,000
2X	200k random training rows (X + 100k fresh)	200,000

X + sym(X) and 2X have the same row count, so any difference isolates the effect of symmetry structure vs. raw data volume.

Test set: 1,666,521 holdout rows (unchanged across conditions).

Results

Condition	Train Size	Log Loss	Accuracy	Sym Consistency	Egg Inv
X (100k)	100,000	0.5780	68.7%	0.0829	7.15%
X + sym(X)	200,000	0.5708	69.0%	0.0396	5.75%
2X (200k)	200,000	0.5709	69.1%	0.0639	5.85%

Symmetry consistency = mean |P(state) + P(swap(state)) - 1|; 0 is perfect.

Key Comparisons

Comparison	Log Loss Delta	Sym Consistency
X+sym(X) vs X (augmentation helps?)	-0.0073	0.0396 vs 0.0829 (2.1x better)
X+sym(X) vs 2X (symmetry vs data?)	-0.0002	0.0396 vs 0.0639 (1.6x better)
2X vs X (more data helps?)	-0.0071	0.0639 vs 0.0829 (1.3x better)

Conclusions

Log loss: X+sym(X) and 2X are essentially tied (0.5708 vs 0.5709). Symmetry augmentation matches the predictive benefit of 2x real data.
Symmetry consistency: X+sym(X) is substantially better than 2X (0.0396 vs 0.0639). The augmented model's predictions are 1.6x more self-consistent under Blue/Gold swap. This is structural value that more raw data alone does not provide.
Recommendation: Use symmetry augmentation in production training. It's free (just a numpy permutation + sign flip), adds no real compute cost, and produces a model that better respects the game's inherent symmetry without sacrificing predictive accuracy.