Notes   /   12 November 2018

An activity to help understand the Markov Chain algorithm

Instructions and an example follow below. For each table, I have a short sample of text to work on. Perhaps you will recognize these sources, but it really doesn't matter for this activity.

Table 1

It is a truth universally acknowledged, that a single man in possession of a good fortune, must be in want of a wife. However little known the feelings or views of such a man may be on his first entering a neighbourhood, this truth is so well fixed in the minds of the surrounding families, that he is considered the rightful property of some one or other of their daughters.

Table 2

To be, or not to be, that is the question: Whether ’tis nobler in the mind to suffer the slings and arrows of outrageous fortune, or to take arms against a sea of troubles, and by opposing end them? To die -- to sleep, No more; and by a sleep to say we end The heartache, and the thousand natural shocks that flesh is heir to: ’tis a consummation devoutly to be wish’d.

Table 3

No one would have believed in the last years of the nineteenth century that this world was being watched keenly and closely by intelligences greater than man’s and yet as mortal as his own; that as men busied themselves about their various concerns they were scrutinised and studied, perhaps almost as narrowly as a man with a microscope might scrutinise the transient creatures that swarm and multiply in a drop of water.

Table 5

In the bosom of one of those spacious coves which indent the eastern shore of the Hudson, at that broad expansion of the river denominated by the ancient Dutch navigators the Tappan Zee, and where they always prudently shortened sail and implored the protection of St. Nicholas when they crossed, there lies a small market town or rural port, which by some is called Greensburgh, but which is more generally and properly known by the name of Tarry Town.

Table 6

What was the secret, they wanted to know; in a thousand different ways they wanted to know The Secret. And not one of them was prepared, truly prepared to believe that it had not so much to do with chemicals and zippy mental tricks as with that most unprofound and sometimes heart-rending process of removing, molecule by molecule, the very tough rubber that comprised the bottoms of his training shoes. The Trial of Miles, Miles of Trials. How could they be expected to understand that?

Example

For the following example, I'll use this sample text, which is taken from Edgar Allan Poe's short story, "The Cask of Amontillado."

The thousand injuries of Fortunato I had borne as I best could, but when he ventured upon insult, I vowed revenge. You, who so well know the nature of my soul, will not suppose, however, that I gave utterance to a threat. At length I would be avenged; this was a point definitely settled--but the very definitiveness with which it was resolved, precluded the idea of risk.

First, build the database

Step 1: Make a list of all the words in your text

This is a list of all unique words. You can decide whether or not you want to include punctuation and capitalization, but if you do, "The" and "the" would be two separate entries. In my case, I chose to ignore capitalization and punctuation, so "the" is just one entry. This list is your "index."

Step 2: Make lists of all next words

For each word in your index, find each occurrence of that word in your text, and record a list of all of those "next" words. List each next word every time it occurs -- in other words, you should include all duplicates.

Write these lists off to the right of each index word. You should now have a database that looks something like this.

Next, generate some text

Step 3: Choose a random starting word

I like to use Random.org for this kind of thing. They have a widget on the homepage where you can get a number between 1 and the number words in your list. Or you can paste a list of words into their list randomizer.

I just rolled "4" so my word is at.

Step 4: Choose the next word

With that starting word chosen, look it up in your index, and then do a similarly random selection from the words in that words' "next word" list.

In my case, the only possible next word for "at" is length, so I'll choose that.

Step 5: Choose the next next word

Now, look up that next word in the index and randomly choose one of its next words to add to your chain.

In my case, the only possible next word for "length" is I, so I'll choose that. So far my chain is At length I.

Step 6: Repeat steps 5 and 6 until you've run out of next words or your generated text is long enough.

"I" has five next words. I rolled a 3, so my new word is vowed.

Share the result

Working through this method got me the following result.

At length I vowed revenge. You who so well know the idea of Fortunato I had borne as I best could but when he ventured upon insult I would be avenged

Not bad! It's a pretty tedious process, though, so it would be useful and indeed fairly straightforward to write a Python program that did this all for us.

Even better, there's already a decent Python library that automatically does all this for us!