Tuesday, May 20, 2008

Decrypter 3

Diving deeper into the decrypter there are some interesting things one could notice. As said before the decrypter analyses the character frequency of a file before solving the encrypted file. When solving the more "previous characters" (i.e. for lunch zero through 3 previous characters: "h", "c'h", "nc'h", "unc'h") analysied the faster and more accurate it will solve. However if too many previous characters are used it wont work. Additionally i think that this program is extremely simaler to a backtracking sudoku solver. If you replace the number 0 through 9 with the frequencies and instead of checking the rows, columns, and indivudal squares you instead check if the last bit you added to your guess exicts in the english language then you have the decrypter. In fact you could probably modify the decrypter's check function (in Guess class) then change the encyrpted file to something like "01234567890123456789 ... 0123456789" (9 sets of zero through 9) and change the file that the frequencies are reading from to "0123456789" (and use zero previous characters), add the givens and it should work.

For any of you who have tried out the decrypter source code (and i know that as I'm typing this no one has) you may run into some bugs and or noticed it can be very slow. I'll take some time to mention why this is and what i plan to do to improve the speed. As far as bugs are concerined a am not aware of any reasons why, other than spectulations, but i do know at least one exicts.

some of the bugs and or reasons that it slow:

1. First known bug is that when it begins to solve there is a possiblity (very,very small that it will blow up). I have no idea why this happened but I didn't persue that individual bug but rather some others, this bug i guess is kind of a left over im not to worried about.
2. right now the source code is set to calculate zero->n previous characters. This is completely unnecessary and is actually a result of me not designing my software well. The result of having the decrypt er use the previous character from zero to n causes it to check the ever single possibility using the previous character up until n in which it only checks the frequencies necessary. Although that last statement may sound bad (or just confusing) it should be noted that doing it this really only creates dramatic slow down when generating the frequencies.
3. lastly is that it requires a extremely large file for the frequencies. the 900 KB file when opened in Word was ~250 pages long. This also causes the program a ridiculous amount of time to create the frequencies.
4. the last problem is that it confuses punctuation. it may produce the right answer but with like commas instead of periods, this isn't that big a deal.


Overall with the bugs and slowdown i still think that using this to solve a cryptogram in the newspaper is worth it. I solved one in about 15 minutes with my 2.16 GHZ processer working at 50%. I did need about 1.1-1.3 GB of RAM though, so watch out if you are using less than 2GB of RAM (what i have).

No comments: