§ Widget · 05 Factorized Nucleotide Supervision

The model scores the whole 6-mer at once, but FNS reads it one position at a time: for each slot, sum every 6-mer that carries the observed base there. Edit the ground-truth 6-mer to watch the factorization and the loss update.

Reading the notation. The model only ever predicts whole 6-mers, so it never directly reports a probability for a single base. p(1)(A) is what we want: the probability that position 1 of the next 6-mer is A. We recover it by marginalizing, summing the probability of every 6-mer that carries A in slot 1 while the other five positions range over all bases (the shuffling boxes, written Σ*).

Computing the loss. Doing this at each slot gives six per-position probabilities, one for each base of the ground-truth 6-mer. Their product is the model's probability of getting every position right; taking −log turns it into a loss, and dividing by 6 averages it per base. That is the loss = − ¹⁄₆ log(…) expression above. Because the credit is split across positions, a near-miss like TATATT still scores well for the five bases it got right, unlike plain 6-mer cross-entropy, which treats any imperfect token as equally wrong.