logo logo

The Shepherd's Farm

Home Farm News Contact About Blog

Around the Ladoum Sheep and More. A Blog from Jacques E. Boillat, Thu April 21 2022

Spell Checkers and DNA

Spell Checkers

A spell checker1 is a computer program that is used to find typing errors in a text. A spell checker is comparing the words in a text with the words of a dictionary. A word is a sequence of characters and is called string in computer science. There are two possibilities when processing a text word. If the word is in the dictionary, everything is fine. If not, there must be a typing error. There are 3 possible kinds of errors:

  1. Substitute [S]: a letter as been replaced with another
  2. Delete [D]: there is an extra letter in the text
  3. Insert [I]: one letter is missing in the text

I don't want to describe the exact algorithm here, but the idea is simple: given a misspelled word we modify it using the operations [S], [D], and [I] until we get a valid word in the dictionary. We are doing it in greedy way, i.e. using the minimal number of operations. We call that number the edit distance between the text word and the word in the dictionary.

Note that this simple strategy will not find any error in the sentence: Their coming too sea if its reel.

From spam to poems

For example, one way to get from spam to poems is through the following sequence of 4 operations:

DNA Sequences

DNA2 is a sequence of molecules known as A, T, C, and G. The four molecules A, T, C, and G can be considered as encoding elements or building bricks of life. A gene is a sequence consisting of these four molecules. You can build a house with bricks. It is similar with DNA: you will be building a gene with a sequence of the molecules A, T, C, and G, i.e. a sequence of bricks. We could also say that the language of DNA is using the alphabet {A, T, C, G}

Each living organism has a long sequence of these letters that make up its genome. Computer scientists refer to a sequence of symbols as a string. If we want to compare two individuals, we do it comparing their DNA sequences, i.e. by comparing two strings. We already know how to compare words in a text. We can use exactly the same technique for the comparison of DNA sequences.

Humans, Chimpanzees, and Gorillas

Question: Are humans more closely related to chimpanzees or gorillas
Answer: Humans are evidently more closely related to chimpanzees than to gorillas. Humans and chimpanzees diverged from their common ancestor approximately 4–6 million years ago, while gorillas diverged about 2 million years before that.
Why: The primary methods involves computational analysis of DNA.

RESULT: Based on DNA comparison, any two human beings are 99.9 percent identical. The value drops to 98.8%, when comparing human beings with chimpanzees. It drops again to 98.4%, when comparing human beings with gorillas4!

Note that the compared DNA sequences are extremely long. The length of the human genome is 3,117,275,501 letters. Due to the size of the problem, it is not that easy to write an efficient program computing the distance of two DNA sequences, but it is possible.

DNA comparison is now frequently used to compare animals of a same breed!


  1. Wikipedia. Spell Checker (last visited Apr. 19, 2022)

  2. Wikipedia. DNA (last visited Apr. 19, 2022)

  3. Wikipedia. Human Genome (last visited Apr. 19, 2022)

  4. Smithsonian National Museum of Natural Historiy Genetic evidence (last visited Apr. 21, 2022)


Prof. Dr. Jacques E. Boillat is a retired professor for Computer Sciences. Jacques E. Boillat has been studying Mathematics, Theoretical Pysics, and Philosophy at the University of Berne, in Switzerland. He has been teaching at the University of Applied Sciences in Berne and at the University of the Gambia from 2006 to 2020. His wife Lenna Correa Boillat is owner of the Shepherd's Farm in Bato Kunku.


Send any comment to [email protected]