Information About the Vigenère Cipher Codebreaker Algorithm
Vigenère Ciphers
A Vigenère cipher shifts each character of a plain text message a number of positions
based on a keyword. Essentially, a Vigenère cipher consists of several Caesar
ciphers in sequence with different shift values.
The shift value for any given character is based on the keyword. The keyword
is repeated so that it is the same length of the message. Then, the corresponding
keyword character determines the shift for it's respective message character.
Codebreaking Without a Key
Guessing the Key Length
Our first step is to examine repetitions in the encrypted text so we can guess at
the length of the key. In a Vigenère cipher, common words such as "the" or
"that" can be encrypted differently each time. However, if the message is
long enough, repetitions in the code can still be exploited. This codebreaker
analyzes the space between these repetitions to make a guess at the key length.
Guessing the Most Probable Key
After making an educated guess at the key length, we now need to guess at the
key text itself. Research of the English language has produced a table of
how often each letter of the English alphabet is used in the English language.
For example, "e" is used most often, so for most encryptions we can guess that the
most often used character is probably actually an "e". With a Vigenère cipher,
this is difficult because an "e" could be encrypted in multiple ways based on the
key. Counting the characters won't work.
However, if we know that the key is of length N, then we know that every Nth character
is encrypted the same way. We can split the whole encrypted message into N
messages by lumping the characters that are encrypted with the same shift.
Then we essentially have N different messages that are normal Caesar shifts.
Each is shifted by one value - a single character of the key. We can apply
frequency analysis to each individual message to guess at those single characters.
We combine our best guesses at each character to get the best keyword guess.
Looking At More Keys
Since the above method makes many assumptions, it's likely that the best guess
is not the key. This codebreak tries to overcome that limitation by making
many best guesses. For example, the 2nd best guess may use the 2nd most likely
letter based on frequency analysis to guess the 1st character of the key.
We find about 1000 best guesses.
Then we further rank the "strength" of these best guesses. A guess is considered
"stronger" if it contains a word from the English dictionary. After all, a
key is most likely going to be an actual word or multiple words. A guess is
considered even stronger if it contains special geocaching terms like "cache."
The best of the best guesses rank higher and are displayed higher on the list.