Saturday, May 17, 2008

Decrypter 1

This Decrypter is something i have been working on for a couple of weeks now, on and off. As mentioned before, it's almost finished and just needs some polishing. But before I explain what needs to be done to finish it I'll first go over the nuts and bolts. **Please note that this program is not intended for ANY malicious activities, I use this for solving the cryptograms in the newspaper.**
People seem to have a hard time understanding this program so I'm going to go into the details first then zoom out to the overall picture. First we must understand the frequency calculator. The program first calculates a bunch of frequencies from a really big file. The file is ~900 KB large and is filed with text from books. The program then calculates the frequencies based on this file by taking each character and the characters previous to it.
For example in the sentence:
"Dave Doesn't Dance Daily."
The frequencies would be:
"D'a", "a'v", "v'e", "e' ", " 'D", "D'o", "o'e", "e's", "s'n", "n'", "'t", "t' ", " 'D", "D'a", "a'n", "n'c", "c'e", "e' ", " 'D", "D'a", "a'i", "i'l", "l'y"
if you notice "Da" appears 3 times and " D" appears 3 times. Also notice that "e" is preceded by "v", "o", and "c". So if you see a "e" then there is a 33% chance that there was a "v" before it. Likewise if you see a "D" there is 75% chance that a "a" is after it and a "25%" chance that a "o" follows it. This is only using one previous character if there was two previous characters it would look like this "Dav", "ave", and "ve "... Three characters would look like this "Dave", "ave ", and "ve D"...
So hopefully you can see how this would apply to a larger file with more text. I plan on adding a program that shows how the frequencies of a file can are calculated like this.

With the frequencies we can now attempt to solve the encrypted item. First we need to group all the frequencies together. For example we a group would be all the letters that are preceded by a "D" like "Da" and "Do". Then we need to construct a guess. We add the parts last letter of a frequency like the "a" or "o" and we pick them passed on the last letters we guessed in the guess, like the "D". After this we fill in the rest of the encrypted item with the part we just guessed. Then we have to check the newly add part. Remember that we don't need to check the previous letters in the guess because they will be from a frequency but all the parts afterwards because they may not for a guess. We need to check the guess using the frequencies. if a part of the guess is not in the frequencies then it isn't a valid possibility and we shouldn't go any further with that guess. But if every thing checks out we need to do the same process with the new guess.

This may not be explained the best and hopefully I'll put up a pseudocode algorithm for this program. The pseudocode may clear up things with this above "understanding". Any questions just post and I'll try to respond. Please also understand that this is a brief, poor explanation.

No comments: