Pages

Subscribe:

Ads 468x60px

Labels

Saturday, February 13, 2010

Working of CAPTCHA

How CAPTCHA Works

 

You're using your computer to purchase tickets to see They Might Be Giants play a concert at a local venue. Before you can buy the tickets, you first have to pass a test. It's not a hard test -- in fact, that's the point. For you, the test should be simple and straightforward. But for a computer, the test should be almost impossible to solve.
This sort of test is a CAPTCHA, an acronym that stands for Completely Automated Public Turing Test to Tell Computers and Humans Apart. They're also known as a type of Human Interaction Proof (HIP). You've probably seen CAPTCHA tests on lots of Web sites. The most common form of CAPTCHA is an image of several distorted letters. It's your job to type the correct series of letters into a form. If your letters match the ones in the distorted image, you pass the test.



Gmail CAPTCHA
Google's Gmail service requires new users to enter a CAPTCHA before creating an account.

Why would anyone need to create a test that can tell humans and computers apart? It's because of people trying to game the system -- they want to exploit weaknesses in the computers running the site. While these individuals probably make up a minority of all the people on the Internet, their actions can affect millions of users and Web sites. For example, a free e-mail service might find itself bombarded by account requests from an automated program. That automated program could be part of a larger attempt to send out spam mail to millions of people. The CAPTCHA test helps identify which users are real human beings and which ones are computer programs.

Greetings, Program!
One of the ironies of the CAPTCHA program is that a CAPTCHA application can generate a test that even it can't solve without already knowing the answer.
­One interesting thing about CAPTCHA tests is that the people who design the tests aren't always upset when their tests fail. That's because for a CAPTCHA test to fail, someone has to find a way to teach a computer how to solve the test. In other words, every CAPTCHA failure is really an advance in artificial intelligence.
Let's take a closer look at exactly what a CAPTCHA .­


CAPTCHAs and the Turing Test

CAPTCHA technology has its foundation in an experiment called the Turing Test. Alan Turing, sometimes called the father of modern computing, proposed the test as a way to examine whether or not machines can think -- or appear to think -- like humans. The classic test is a game of imitation. In this game, an interrogator asks two participants a series of questions. One of the participants is a machine and the other is a human. The interrogator can't see or hear the participants and has no way of knowing which is which. If the interrogator is unable to figure out which participant is a machine based on the responses, the machine passes the Turing Test.
Of course, with a CAPTCHA, the goal is to create a test that humans can pass easily but machines can't. It's also important that the CAPTCHA application is able to present different CAPTCHAs to different users. If a visual CAPTCHA presented a static image that was the same for every user, it wouldn't take long before a spammer spotted the form, deciphered the letters, and programmed an application to type in the correct answer automatically.


Image CAPTCHA

Not all CAPTCHAs require you to type in text. This version asks users to use a mouse to trace certain shapes found in photographs.
Most, but not all, CAPTCHAs rely on a visual test. Computers lack the sophistication that human beings have when it comes to processing visual data. We can look at an image and pick out patterns more easily than a computer. The human mind sometimes perceives patterns even when none exist, a quirk we call pareidolia. Ever see a shape in the clouds or a face on the moon? That's your brain trying to associate random information into patterns and shapes.
I'm Sorry, I'll Read That Again
Now and then, a CAPTCHA presents an image or sound that's so distorted, even humans can't decipher it. That's why many CAPTCHA applications provide users with an option to generate a new CAPTCHA and try again. Hopefully the second time around won't be as confusing as the first.


But not all CAPTCHAs rely on visual patterns. In fact, it's important to have an alternative to a visual CAPTCHA. Otherwise, the Web site administrator runs the risk of disenfranchising any Web user who has a visual impairment. One alternative to a visual test is an audible one. An audio CAPTCHA usually presents the user with a series of spoken letters or numbers. It's not unusual for the program to distort the speaker's voice, and it's also common for the program to include background noise in the recording. This helps thwart voice recognition programs.
Another option is to create a CAPTCHA that asks the reader to interpret a short passage of text. A contextual CAPTCHA quizzes the reader and tests comprehension skills. While computer programs can pick out key words in text passages, they aren't very good at understanding what those words actually mean.

Next , we'll take a closer look at the kinds of sites that use CAPTCHA to verify whether or not you have a pulse.

Who Uses CAPTCHA

One common application of CAPTCHA is for verifying online polls. In fact, a former Slashdot poll serves as an example of what can go wrong if pollsters don't implement filters on their surveys. In 1999, Slashdot published a poll that asked visitors to choose the graduate school that had the best program in computer science. Students from two universities -- Carnegie Mellon and MIT -- created automated programs called bots to vote repeatedly for their respective schools. While those two schools received thousands of votes, the other schools only had a few hundred each. If it's possible to create a program that can vote in a poll, how can we trust online poll results at all? A CAPTCHA form can help prevent programmers from taking advantage of the polling system.

Registration forms on Web sites often use CAPTCHAs. For example, free Web-based e-mail services like Hotmail, Yahoo! Mail or Gmail allow people to create an e-mail account free of charge. Usually, users must provide some personal information when creating an account, but the services typically don't verify this information. They use CAPTCHAs to try to prevent spammers from using bots to generate hundreds of spam mail accounts.


Yahoo! CAPTCHA

Yahoo uses alphanumeric strings rather than words as CAPTCHAs when you sign up for a Yahoo! account.


Ticket brokers like TicketMaster also use CAPTCHA applications. These applications help prevent ticket scalpers from bombarding the service with massive ticket purchases for big events. Without some sort of filter, it's possible for a scalper to use a bot to place hundreds or thousands of ticket orders in a matter of seconds. Legitimate customers become victims as events sell out minutes after tickets become available. Scalpers then try to sell the tickets above face value. While CAPTCHA applications don't prevent scalping, they do make it more difficult to scalp tickets on a large scale.

Some Web pages have message boards or contact forms that allow visitors to either post messages to the site or send them directly to the Web administrators. To prevent an avalanche of spam, many of these sites have a CAPTCHA program to filter out the noise. A CAPTCHA won't stop someone who is determined to post a rude message or harass an administrator, but it will help prevent bots from posting messages automatically.

The most common form of CAPTCHA requires visitors to type in a word or series of letters and numbers that the application has distorted in some way. Some CAPTCHA creators came up with a way to increase the value of such an application: digitizing books. An application called reCAPTCHA harnesses users responses in CAPTCHA fields to verify the contents of a scanned piece of paper. Because computers aren't always able to identify words from a digital scan, humans have to verify what a printed page says. Then it's possible for search engines to search and index the contents of a scanned document.

Here's how it works: First, the administrator of the reCAPTCHA program digitally scans a book. Then, the reCAPTCHA program selects two words from the digitized image. The application already recognizes one of the words. If the visitor types that word into a field correctly, the application assumes the second word the user types is also correct. That second word goes into a pool of words that the application will present to other users. As each user types in a word, the application compares the word to the original answer. Eventually, the application receives enough responses to verify the word with a high degree of certainty. That word can then go into the verified pool.

It sounds time consuming, but remember that in this case the CAPTCHA is pulling double duty. Not only is it verifying the contents of a digitized book, it's also verifying that the people filling out the form are actually people. In turn, those people are gaining access to a service they want to use.

Next, we'll take a look at the process that goes into creating a CAPTCHA.


Creating a CAPTCHA

The first step to creating a CAPTCHA is to look at the different ways humans and machines process information. Machines follow sets of instructions. If something falls outside the realm of those instructions, the machine isn't able to compensate. A CAPTCHA designer has to take this into account when creating a test. For example, it's easy to build a program that looks at metadata -- the information on the Web that's invisible to humans but machines can read. If you create a visual CAPTCHA and the image's metadata includes the solution, your CAPTCHA will be broken in no time.

Similarly, it's unwise to build a CAPTCHA that doesn't distort letters and numbers in some way. An undistorted series of characters isn't very secure. Many computer programs can scan an image and recognize simple shapes like letters and numbers.
Prepackaged
Installing a CAPTCHA on your Web site is as easy as copying a few lines of code into your site's HTML page. And it won't even cost you a dime -- many CAPTCHA applications are free.



One way to create a CAPTCHA is to pre-determine the images and solutions it will use. This approach requires a database that includes all the CAPTCHA solutions, which can compromise the reliability of the test. According to Microsoft Research experts Kumar Chellapilla and Patrice Simard, humans should have an 80 percent success rate at solving any particular CAPTCHA, but machines should only have a 0.01 success rate [source: Chellapilla and Simard]. If a spammer managed to find a list of all CAPTCHA solutions, he or she could create an application that bombards the CAPTCHA with every possible answer in a brute force attack. The database would need more than 10,000 possible CAPTCHAs to meet the qualifications of a good CAPTCHA.

Other CAPTCHA applications create random strings of letters and numbers. You aren't likely to ever get the same series twice. Using randomization eliminates the possibility of a brute force attack -- the odds of a bot entering the correct series of random letters are very low. The longer the string of characters, the less likely a bot will get lucky.
Can You Hear Me Now?
In many ways, audible CAPTCHAs are similar to visual ones. In a database approach, the CAPTCHA creator must pre-record a person or computer speaking every series of characters and then match them with the right solution. With a randomized approach, the creator pre-records each character individually and the application strings the characters together randomly to create CAPTCHAs.



CAPTCHAs take different approaches to distorting words. Some stretch and bend letters in weird ways, as if you're looking at the word through melted glass. Others put the word behind a crosshatched pattern of bars to break up the shape of the letters. A few use different colors or a field of dots to achieve the same effect. In the end, the goal is the same: to make it really hard for a computer to figure out what's in the CAPTCHA.

Designers can also create puzzles or problems that are easy for humans to solve. Some CAPTCHAs rely on pattern recognition and extrapolation. For example, a CAPTCHA might include a series of shapes and ask the user which shape among several choices would logically come next. The problem with this approach is that not all humans are good with these kinds of problems and the success rate for a human user can drop below 80 percent.

Next, we'll take a look at how computers can break CAPTCHAs.


Breaking a CAPTCHA

The challenge in breaking a CAPTCHA isn't figuring out what a message says -- after all, humans should have at least an 80 percent success rate. The really hard task is teaching a computer how to process information in a way similar to how humans think. In many cases, people who break CAPTCHAs concentrate not on making computers smarter, but reducing the complexity of the problem posed by the CAPTCHA.

Let's assume you've protected an online form using a CAPTCHA that displays English words. The application warps the font slightly, stretching and bending the letters in unpredictable ways. In addition, the CAPTCHA includes a randomly generated background behind the word.

A programmer wishing to break this CAPTCHA could approach the problem in phases. He or she would need to write an algorithm -- a set of instructions that directs a machine to follow a certain series of steps. In this scenario, one step might be to convert the image in grayscale. That means the application removes all the color from the image, taking away one of the levels of obfuscation the CAPTCHA employs.

Next, the algorithm might tell the computer to detect patterns in the black and white image. The program compares each pattern to a normal letter, looking for matches. If the program can only match a few of the letters, it might cross reference those letters with a database of English words. Then it would plug in likely candidates into the submit field. This approach can be surprisingly effective. It might not work 100 percent of the time, but it can work often enough to be worthwhile to spammers.


Gimpy CAPTCHA
The Gimpy CAPTCHA displays 10 words, but you only have to type three in correctly to pass the test.



What about more complex CAPTCHAs? The Gimpy CAPTCHA displays 10 English words with warped fonts across an irregular background. The CAPTCHA arranges the words in pairs and the words of each pair overlap one another. Users have to type in three correct words in order to move forward. How reliable is this approach?

As it turns out, with the right CAPTCHA-cracking algorithm, it's not terribly reliable. Greg Mori and Jitendra Malik published a paper detailing their approach to cracking the Gimpy version of CAPTCHA. One thing that helped them was that the Gimpy approach uses actual words rather than random strings of letters and numbers. With this in mind, Mori and Malik designed an algorithm that tried to identify words by examining the beginning and end of the string of letters. They also used the Gimpy's 500-word dictionary.

Mori and Malik ran a series of tests using their algorithm. They found that their algorithm could correctly identify the words in a Gimpy CAPTCHA 33 percent of the time [source: Mori and Malik]. While that's far from perfect, it's also significant. Spammers can afford to have only one-third of their attempts succeed if they set bots to break CAPTCHAs several hundred times every minute.
Electronic Ears
Audio CAPTCHAs aren't foolproof either. In the spring of 2008, there were reports that hackers figured out a way to beat Google's audio CAPTCHA system. To crack an audio CAPTCHA, you have to create a library of sounds representing each character in the CAPTCHA's database. Keep in mind that depending on the distortion, there might be several sounds for the same character. After categorizing each sound, the spammer uses a variation of voice-recognition software to interpret the audio CAPTCHA [source: Networkworld].



You'd think that the inventors of CAPTCHA would be upset that their hard work is being picked apart by hackers, but you'd be wrong.

CAPTCHA and Artificial Intelligence

Luis von Ahn of Carnegie Mellon University is one of the inventors of CAPTCHA. In a 2006 lecture, von Ahn talked about the relationship between things like CAPTCHA and the field of artificial intelligence (AI). Because CAPTCHA is a barrier between spammers or hackers and their goal, these people have dedicated time and energy toward breaking CAPTCHAs. Their successes mean that machines are getting more sophisticated. Every time someone figures out how to teach a machine to defeat a CAPTCHA, we move one step closer to artificial intelligence.


EZ-Gimpy CAPTCHA

Hackers have found ways to teach computers how to recognize the text in EZ-Gimpy CAPTCHAs.


As people find new ways to get around CAPTCHA, computer scientists like von Ahn develop CAPTCHAs that address other challenges in the field of AI. A step backward for CAPTCHA is still a step forward for AI -- every defeat is also a victory [source: Human Computation].

But what about Web administrators? They might not find von Ahn's philosophy to be nearly as attractive. From their perspective, they still have to deal with a massive problem -- spammers and hackers. People who maintain Web sites or create online polls need to be aware that several CAPTCHA systems are no longer effective. It's important to do a little research on which CAPTCHA applications are still reliable. And it's equally important to keep up to date on the subject. If one CAPTCHA system fails, the administrator might need to remove the code from his or her site and replace it with another version.

As for CAPTCHA designers, they have to walk a fine line. As computers become more sophisticated, the testing method must also evolve. But if the test evolves to the point where humans can no longer solve a CAPTCHA with a decent success rate, the system as a whole fails. The answer may not involve warping or distorting text -- it might require users to solve a mathematical equation or answer questions about a short story. And as these tests get more complicated, there's a risk of losing user interest. How many people will still want to post a reply to a message board if they must first solve a quadratic equation?
Shall We Play a Game?
Luis von Ahn has a reputation for harnessing human computation as a way to advance computer technology. How do you convince people to help you make machines smarter? Turn it into a game! Here are a few of the games von Ahn has worked on that make computer programs more effective:
  • The ESP Game, which pairs players up, shows each player a picture, and challenges the players to come up with the same tags to describe that picture. Each verified tag helps categorize the photo for search engines.
  • Then there's Verbosity. One player describes a word to another player using a series of clues. The other player must guess the correct word.
  • The Matchin game presents the same two photos to two different players. Each player picks the photo that he or she likes the most. Both players earn points for every match. As the game gathers results, it categorizes photos from most attractive to least attractive.



Eventually, we might reach a point where computers and humans perceive puzzles the same way. If that happens, tests like CAPTCHA will become useless lines of code. Until then, we'll just have to squint (or listen) carefully while trying to decipher CAPTCHA codes.




Working of Code Breakers

How Code Breakers Work

Information is an important commodity. Nations, corporations and individuals protect secret information with encryption, using a variety of methods ranging from substituting one letter for another to using a complex algorithm to encrypt a message. On the other side of the information equation are people who use a combination of logic and intuition to uncover secret information. These people are cryptanalysts, also known as code breakers.
Binary Code
Binary code is the basis for many modern ciphers.
A person who communicates through secret writing is called a cryptographer. Cryptographers might use codes, ciphers or a combination of both to keep messages safe from others. What cryptographers create, cryptanalysts attempt to unravel.
Throughout the history of cryptography, people who created codes or ciphers were often convinced their systems were unbreakable. Cryptanalysts have proven these people wrong by relying on everything from the scientific method to a lucky guess. Today, even the amazingly complex encryption schemes common in Internet transactions may have a limited useful lifetime -- quantum computing might make solving such difficult equations a snap.

You Say Cryptology, I Say Cryptography
In English, the words cryptology and cryptography are often interchangeable -- both refer to the science of secret writing. Some people prefer to differentiate the words, using cryptology to refer to the science and cryptography to refer to the practice of secret writing.
In this article, we'll look at some of the most popular codes and cipher systems used throughout history. We'll learn about the techniques cryptanalysts use to break codes and ciphers, and what steps cryptographers can take to make their messages more difficult to figure out. At the end, you'll get the chance to take a crack at an enciphered message.

To learn how code breakers crack secret messages, you need to know how people create codes. we'll learn about some of the earliest attempts at hiding messages.


Polybius Squares and Caesar Shifts

Although historical findings show that several ancient civilizations used elements of ciphers and codes in their writing, code experts say that these examples were meant to give the message a sense of importance and formality. The person writing the message intended for his audience to be able to read it.
The Greeks were one of the first civilizations to use ciphers to communicate in secrecy. A Greek scholar named Polybius proposed a system for enciphering a message in which a cryptographer represented each letter with a pair of numbers ranging from one to five using a 5-by-5 square (the letters I and J shared a square). The Polybius Square (sometimes called the checkerboard) looks like this:
1
2
3
4
5
1
A
B
C
D
E
2
F
G
H
I/J
K
3
L
M
N
O
P
4
Q
R
S
T
U
5
V
W
X
Y
Z
A cryptographer would write the letter "B" as "12". The letter O is "34". To encipher the phrase "How Stuff Works," the cryptographer would write "233452 4344452121 5234422543." Because he replaces each letter with two numbers, it's difficult for someone unfamiliar with the code to determine what this message means. The cryptographer could make it even more difficult by mixing up the order of the letters instead of writing them out alphabetically.
Julius Caesar invented another early cipher -- one that was very simple and yet confounded his enemies. He created enciphered messages by shifting the order of the alphabet by a certain number of letters. For example, if you were to shift the English alphabet down three places, the letter "D" would represent the letter "A," while the letter "E" would mean "B" and so forth. You can visualize this code by writing the two alphabets on top of one another with the corresponding plaintext and cipher matching up like this:
Plaintext
a
b
c
d
e
f
g
h
i
j
k
l
m
Cipher
D
E
F
G
H
I
J
K
L
M
N
O
P
Plaintext
n
o
p
q
r
s
t
u
v
w
x
y
z
Cipher
Q
R
S
T
U
V
W
X
Y
Z
A
B
C

Notice that the cipher alphabet wraps around to "A" after reaching "Z." Using this cipher system, you could encipher the phrase "How Stuff Works" as "KRZ VWXII ZRUNV."
Both of these systems, the Polybius Square and the Caesar Shift, formed the basis of many future cipher systems.
In the next section, we'll look at a few of these more advanced methods of encryption.

Deciphering the Language
To encipher a message means to replace the letters in the text with the replacement alphabet. The readable message is called the plaintext. The cryptographer converts the plaintext into a cipher and sends it on. The recipient of the message uses the proper technique, called the key, to decipher the message, changing it from a cipher back into a plaintext.


The Trimethius Tableau

After the fall of the Roman Empire, the Western world entered what we now call the Dark Ages. During this time, scholarship declined and cryptography suffered the same fate. It wasn't until the Renaissance that cryptography again became popular. The Renaissance was not only a period of intense creativity and learning, but also of intrigue, politics, warfare and deception.
Cryptographers began to search for new ways to encipher messages. The Caesar Shift was too easy to crack -- given enough time and patience, almost anyone could uncover the plaintext behind the ciphered text. Kings and priests hired scholars to come up with new ways to send secret messages.
One such scholar was Johannes Trimethius, who proposed laying out the alphabet in a matrix, or tableau. The matrix was 26 rows long and 26 columns wide. The first row contained the alphabet as it is normally written. The next row used a Caesar Shift to move the alphabet over one space. Each row shifted the alphabet another spot so that the final row began with "Z" and ended in "Y." You could read the alphabet normally by looking across the first row or down the first column. It looks like this:


Trimethius Tableau
As you can see, each row is a Caesar Shift. To encipher a letter, the cryptographer picks a row and uses the top row as the plaintext guide. A cryptographer using the 10th row, for example, would encipher the plaintext letter "A" as "J." Trimethius didn't stop there -- he suggested that cryptographers encipher messages by using the first row for the first letter, the second row for the second letter, and so on down the tableau. After 26 consecutive letters, the cryptographer would start back at the first row and work down again until he had enciphered the entire message. Using this method, he could encipher the phrase "How Stuff Works" as "HPY VXZLM EXBVE."
Trimethius' tableau is a good example of a polyalphabetic cipher. Most early ciphers were monoalphabetic, meaning that one cipher alphabet replaced the plaintext alphabet. A polyalphabetic cipher uses multiple alphabets to replace the plaintext. Although the same letters are used in each row, the letters of that row have a different meaning. A cryptographer enciphers a plaintext "A" in row three as a "C," but an "A" in row 23 is a "W." Trimethius' system therefore uses 26 alphabets -- one for each letter in the normal alphabet.
 we'll learn how a scholar named Vigenère created a complex polyalphabetic cipher.


The Vigenère Cipher

In the late 1500s, Blaise de Vigenère proposed a polyalphabetic system that is particularly difficult to decipher. His method used a combination of the Trimethius tableau and a key. The key determined which of the alphabets in the table the decipherer should use, but wasn't necessarily part of the actual message. Let's look at the Trimethius tableau again:
Let's assume you are encrypting a message using the key word "CIPHER." You would encipher the first letter using the "C" row as a guide, using the letter found at the intersection of the "C" row and the corresponding plaintext letter's column. For the second letter, you'd use the "I" row, and so on. Once you use the "R" row to encipher a letter, you'd start back at "C". Using this key word and method, you could encipher "How Stuff Works" this way:
Key
C
I
P
H
E
R
C
I
P
H
E
R
C
Plain
H
O
W
S
T
U
F
F
W
O
R
K
S
Cipher
J
W
L
Z
X
L
H
N
L
V
V
B
U

Your enciphered message would read, "JWL ZXLHN LVVBU." If you wanted to write a longer message, you'd keep repeating the key over and over to encipher your plaintext. The recipient of your message would need to know the key beforehand in order to decipher the text.
Vigenère suggested an even more complex scheme that used a priming letter followed by the message itself as the key. The priming letter designated the row the cryptographer first used to begin the message. Both the cryptographer and the recipient knew which priming letter to use beforehand. This method made cracking ciphers extremely difficult, but it was also time-consuming, and one error early in the message could garble everything that followed. While the system was secure, most people found it too complex to use effectively. Here is an example of Vigenère's system -- in this case the priming letter is "D":
Key
D
H
O
W
S
T
U
F
F
W
O
R
K
Plain
H
O
W
S
T
U
F
F
W
O
R
K
S
Cipher
K
V
K
O
L
N
Z
K
B
K
F
B
C

To decipher, the recipient would first look at the first letter of the encrypted message, a "K" in this case, and use the Trimethius table to find where the "K" fell in the "D" row -- remember, both the cryptographer and recipient know beforehand that the first letter of the key will always be "D," no matter what the rest of the message says. The letter at the top of that column is "H." The "H" becomes the next letter in the cipher's key, so the recipient would look at the "H" row next and find the next letter in the cipher -- a "V" in this case. That would give the recipient an "O." Following this method, the recipient can decipher the entire message, though it takes some time.
The more complex Vigenère system didn't catch on until the 1800s, but it's still used in modern cipher machines
Next, we'll learn about the ADFGX code created by Germany during World War I.


ADFGX Cipher

After the invention of the telegraph, it was now possible for individuals to communicate across entire countries instantaneously using Morse code. Unfortunately, it was also possible for anyone with the right equipment to wiretap a line and listen in on exchanges. Moreover, most people had to rely on clerks to encode and decode messages, making it impossible to send plaintext clandestinely. Once again, ciphers became important.
Germany created a new cipher based on a combination of the Polybius checkerboard and ciphers using key words. It was known as the ADFGX cipher, because those were the only letters used in the cipher. The Germans chose these letters because their Morse code equivalents are difficult to confuse, reducing the chance of errors.
The first step was to create a matrix that looked a lot like the Polybius checkerboard:
A
D
F
G
X
A
A
B
C
D
E
D
F
G
H
I/J
K
F
L
M
N
O
P
G
Q
R
S
T
U
X
V
W
X
Y
Z
Cryptographers would use pairs of cipher letters to represent plaintext letters. The letter's row becomes the first cipher in the pair, and the column becomes the second cipher. In this example, the enciphered letter "B" becomes "AD," while "O" becomes "FG." Not all ADFGX matrices had the alphabet plotted in alphabetical order.
Next, the cryptographer would encipher his message. Let's stick with "How Stuff Works." Using this matrix, we'd get "DFFGXD GFGGGXDADA XDFGGDDXGF."
The next step was to determine a key word, which could be any length but couldn't include any repeated letters. For this example, we'll use the word DEUTSCH. The cryptographer would create a grid with the key word spelled across the top. The cryptographer would then write the enciphered message into the grid, splitting the cipher pairs into individual letters and wrapping around from one row to the next.
D
E
U
T
S
C
H
D
F
F
G
X
D
G
F
G
G
G
X
D
A
D
A
X
D
F
G
G
D
D
X
G
F
Next, the cryptographer would rearrange the grid so that the letters of the key word were in alphabetical order, shifting the letters' corresponding columns accordingly:
C
D
E
H
S
T
U
D
D
F
G
X
G
F
D
F
G
A
X
G
G
G
D
A
G
F
D
X
D
D
F
G
X
He would then write out the message by following down each column (disregarding the letters of the key word on the top row). This message would come out as "DDG DFDD FGAD GAG XXFF GGDG FGXX." It's probably clear why this code was so challenging -- cryptographers enciphered and transposed every plaintext character. To decode, you would need to know the key word (DEUTSCH), then you'd work backward from there. You'd start with a grid with the columns arranged alphabetically. Once you filled it out, you could rearrange the columns properly and use your matrix to decipher the message.


Words Count

One of the ways you can guess at a key word in an ADFGX cipher is to count the number of words in the ciphered message. The number of ciphered words will tell you how long the key word is -- each ciphered word represents a column of text, and each column corresponds to a letter in the key word. In our example, there are seven words in the ciphered message, meaning there are seven columns with a seven-letter key word. Sure enough, DEUTSCH has seven letters. Because the ciphered words and the original message can have different word counts -- seven ciphered words versus three plaintext words in our example -- deciphering the message becomes more challenging.
 we'll look at some of the devices cryptographers have invented to create puzzling ciphers.




Cipher Machines

One of the earliest cipher devices known is the Alberti Disc, invented by Leon Battista Alberti, in the 15th century. The device consisted of two discs, the inner one containing a scrambled alphabet and the outer one a second, truncated alphabet and the numbers 1 to 4. The outer disc rotated to match up different letters with the inner circle, which letters the cryptographer used as plaintext. The outer disc's letters then served as the cipher text.


Da Vinci Code

Dan Brown's novel "The Da Vinci Code" follows the adventures
of a symbology professor as he solves codes and ciphers, some
of which he breaks using a Cardano Grille.

Because the inner disc's alphabet was scrambled, the recipient would need an identical copy of the disc the cryptographer used to decipher the message. To make the system more secure, the cryptographer could change the disc's alignment in the middle of a message, perhaps after three or four words. The cryptographer and recipient would know to change the disc settings after a prescribed number of words, perhaps first setting the disc so that the inner circle "A" matched with the outer circle "W" for the first four words, then with "N" for the next four, and so on. This made cracking the cipher much more difficult.

Cardano Grilles and Steganography

A clever way to hide a secret message is in plain sight. One way to do this is to use a Cardano Grille -- a piece of paper or cardboard with holes cut out of it. To cipher a message, you lay a grille on a blank sheet of paper and write out your message through the grille's holes. You fill the rest of the paper with innocent text. When your recipient receives the message, he lays an identical grille over it to see the secret text. This is a form of steganography, hiding a message within something else.
In the 19th century, Thomas Jefferson proposed a new ciphering machine. It was a cylinder of discs mounted on a spindle. On the edge of each disc were the letters of the alphabet, arranged in random sequence. A cryptographer could align the discs to spell out a short message across the cylinder. He would then look at another row across the cylinder, which would appear to be gibberish, and send that to the recipient. The recipient would use an identical cylinder to spell out the series of nonsense letters, then scan the rest of the cylinder, looking for a message spelled out in English. In 1922, the United States Army adopted a device very similar to Jefferson's; other branches of the military soon followed suit .
Perhaps the most famous ciphering device was Germany's Enigma Machine from the early 20th century. The Enigma Machine resembled a typewriter, but instead of letter keys it had a series of lights with a letter stamped on each. Pressing a key caused an electric current to run through a complex system of wires and gears, resulting in a ciphered letter illuminating. For instance, you might press the key for the letter "A" and see "T" light up.



What made the Enigma Machine such a formidable ciphering device was that once you pressed a letter, a rotor in the machine would turn, changing the electrode contact points inside the machine. This means if you pressed "A" a second time, a different letter would light up instead of "T." Each time you typed a letter, the rotor turned, and after a certain number of letters, a second rotor engaged, then a third. The machine allowed the operator to switch how letters fed into the machine, so that when you pressed one letter, the machine would interpret it as if you had pressed a different letter.
How does a cryptanalyst crack such a difficult code?  we'll learn how codes and ciphers are broken.

Cryptanalysis

While there are hundreds of different codes and cipher systems in the world, there are some universal traits and techniques cryptanalysts use to solve them. Patience and perseverance are two of the most important qualities in a cryptanalyst. Solving a cipher can take a lot of time, sometimes requiring you to retrace your steps or start over. It is tempting to give up when you are faced with a particuarly challenging cipher.
Another important skill to have is a strong familiarity with the language in which the plaintext is written. Trying to solve a coded message written in an unfamiliar language is almost impossible.

Navajo Code Talkers
During World War II, the United States employed Navajo Native Americans to encode messages. The Navajos used a code system based on how their language translated into English. They assigned terms like "airplane" to code words such as "Da-he-tih-hi," which means "Hummingbird." To encipher words that didn't have a corresponding code word, they used an encoded alphabet. This encoded alphabet used Navajo translations of English words to represent letters; for instance, the Navajo word "wol-la-chee" meant "ant," so "wol-la-chee" could stand for the letter "a." Some letters were represented by multiple Navajo words. The Navajo language was so foreign to the Japanese, they never broke the code [source: Kahn].
A strong familiarity with a language includes a grasp of the language's redundancy.
Redundancy means that every language contains more characters or words than are actually needed to convey information. The rules of the English language create redundancy -- for example, no English word will begin with the letters "ng." English also relies heavily on a small number of words. Words like "the," "of," "and," "to," "a," "in," "that," "it," "is," and "I" account for more than one quarter of the text of an average message written in English .
Knowing the redundant qualities of a language makes a cryptanalyst's task much easier. No matter how convoluted the cipher is, it follows some language's rules in order for the recipient to understand the message. Cryptanalysts look for patterns within ciphers to find common words and letter pairings.
One basic technique in cryptanalysis is frequency analysis. Every language uses certain letters more often than others. In English, the letter "e" is the most common letter. By counting up the characters in a text, a cryptanalyst can see very quickly what sort of cipher he has. If the distribution of cipher frequency is similar to the distribution of the frequency of a normal alphabet, the cryptanalyst may conclude that he's dealing with a monoalphabetic cipher.


Frequency Table
This chart shows the frequency with which
each letter in the English language is used.

In the next section, we'll look at more complex cryptanalysis and the role luck plays in breaking a cipher.

Tricks of the Trade

Cryptographers use many methods to confuse cryptanalysts. Acrophony is a method that encodes a letter by using a word that starts with that letter's sound. "Bat" might stand for "b," while "cunning" could stand for "k." A polyphone is a symbol that represents more than one letter of plaintext -- a "%" might represent both an "r" and a "j" for example, whereas homophonic substitution uses different ciphers to represent the same plaintext letter -- "%" and "&" could both represent the letter "c." Some cryptographers even throw in null symbols that don't mean anything at all.
 

Breaking the Code

More complicated ciphers require a combination of experience, experimentation and the occasional shot-in-the-dark guess. The most difficult ciphers are short, continuous blocks of characters. If the cryptographer's message includes word breaks, spaces between each enciphered word, it makes deciphering much easier. The cryptanalyst looks for groups of repeated ciphers, analyze where those groups of letters fall within the context of words and make guesses at what those letters might mean. If the cryptanalyst has a clue about the message's content, he might look for certain words. A cryptanalyst intercepting a message from a Navy captain to command might look for terms referring to weather patterns or sea conditions. If he guesses that "hyuwna" means "stormy," he might be able to crack the rest of the cipher.


Rosslyn Chapel

Breaking the code carved into the ceiling of the Rosslyn Chapel in Scotland reveals a series of musical passages. 
Many polyalphabetic ciphers rely on key words, which makes the message vulnerable. If the cryptanalyst correctly guesses the right key word, he can quickly decipher the entire message. It's important for cryptographers to change key words frequently and to use uncommon or nonsense key words. Remembering a nonsense key word can be challenging, and if you make your cipher system so difficult that your recipient can't decipher the message quickly, your communication system fails.
Cryptanalysts take advantage of any opportunity to solve a cipher. If the cryptographer used a ciphering device, a savvy cryptanalyst will try to get the same device or make one based on his theories of the cryptographer's methodology. During World War II, Polish cryptanalysts obtained an Enigma Machine and were close to figuring out Germany's ciphering system when it became too dangerous to continue. The Polish exchanged their information and technology with the Allies, who created their own Enigma Machines and deciphered many of Germany's coded messages.
Modern high-level encryption methods rely on mathematical processes that are relatively simple to create, but extremely difficult to decipher. Public-key encryption is a good example. It uses two keys -- one for encoding a message and another for decoding. The encoding key is the public key, available to whomever wants to communicate with the holder of the secret key. The secret key decodes messages encrypted by the public key and vice versa. For more information on public-key encryption, see How Encryption Works.
The complex algorithms cryptographers use ensure secrecy for now. That will change if quantum computing becomes a reality. Quantum computers could find the factors of a large number much faster than a classic computer. If engineers build a reliable quantum computer, practically every encrypted message on the Internet will be vulnerable. To learn more about how cryptographers plan to deal with problem, read How Quantum Encryption Works.
 we'll look at some codes and ciphers that remain unsolved, much to cryptanalysts' chagrin.

Famous Unsolved Codes

While most cryptanalysts will tell you that, theoretically, there's no such thing as an unbreakable code, a few cryptographers have created codes and ciphers that no one has managed to crack. In most cases, there's just not enough text in the message for cryptanalysts to analyze. Sometimes, the cryptographer's system is too complex, or there may be no message at all -- the codes and ciphers could be hoaxes.
In the 1800s, a pamphlet with three encrypted messages began to show up in a small community in Virginia. The pamphlet described the adventures of a man named Beale who'd struck it rich panning for gold. Reportedly, Beale had hidden most of his wealth in a secret location and left a coded message leading to the treasure's location with an innkeeper. Twenty years passed with no word from Beale, and the innkeeper sought out help solving the coded messages. Eventually, someone determined that one of the messages used the Declaration of Independence as a code book, but the deciphered message only gave vague hints at the location of the treasure and claimed that the other messages would lead directly to it. No one has solved either of the other messages, and many believe the whole thing to be a hoax.


Zodiac Cipher
The Zodiac killer sent ciphered messages like this one to
San Francisco newspapers
in the 1960s.
In the mid 1960s, residents of San Francisco and surrounding counties were terrified of a vicious killer who taunted police with coded messages. The killer called himself the Zodiac and sent most of his letters to San Francisco newspapers, occasionally dividing up one long ciphered message between three papers. Allegedly, the ciphers perplexed law enforcement and intelligence agencies, though amateur cryptanalysts managed to crack most of them. There are a few messages that have never been solved, some supposedly a clue to the killer's identity.
Richard Feynman, physicist and pioneer in the field of nanotechnology, received three encoded messages from a scientist at Los Alamos and shared them with his graduate students when he couldn't decipher them himself. Currently, they are posted on a puzzle site. Cryptanalysts have only managed to decipher the first message, which turned out to be the opening lines of Chaucer's "Canterbury Tales" written in Middle English.
In 1990, Jim Sanborn created a sculpture called Kryptos for the CIA headquarters in Langley, Va. Kryptos contains four enciphered messages, but cryptanalysts have solved only three. The final message has very few characters (either 97 or 98, depending on whether one character truly belongs to the fourth message), making it very difficult to analyze. Several people and organizations have boasted about solving the other three messages, including the CIA and the NSA.
While these messages along with many others are unsolved today, there's no reason to believe they will remain unsolved forever. For more than 100 years, a ciphered message written by Edgar Allen Poe went unsolved, puzzling professional and amateur cryptanalysts. But in 2000, a man named Gil Broza cracked the cipher. He found that the cipher used multiple homophonic substitutions -- Poe had used 14 ciphers to represent the letter "e" -- as well as several mistakes. Broza's work proves that just because a code hasn't been solved doesn't mean it's not solvable [source: Elonka.com].

You're the Cryptanalyst

The following message is enciphered text using a method similar to one discussed in this article. There are clues in the article that can help you solve the cipher. It might take you a while to find a method that works, but with a little patience you'll figure it out. Good luck!
KWKWKKRWRKKKKKWRSRWWO
SWWSWORSSRWOROSROKSKWK
OKOKWSOWRSSORWRKWOWKR
KSRKRWKWRWSWRROWRSOKS
KSRSWRKKOOWOOOKSOKKRS
RWRWSWROSKKWRWKKSWKSS
RWOORWRWWSWSSKWSWOWRK
SWSWKWKOKKORKROWSKRRK
WSWWWKWOOROWSKRKSKOWW

Highlight below with your mouse to see the answer:
You have deciphered a code based on the ADFGX cipher used by Germany in World War I. The key word was Discovery.