TorfsBot cover image

TorfsBot is a bot that automatically imitates the tweets of Rik Torfs on Twitter.

Table of Contents

Rik Torfs Twitter style

The bot imitates the writing style of professor Rik Torfs, who is used to be the rector of the KU Leuven. Rik Torfs is well known on Twitter for his (semi-)philosophical tweets, which usually contain a subtle quip. Some example tweets of real Rik Torfs himself:

This style is sometimes imitated by Twitter users on the hashtag #tweetenzoalsrik. Rik also regularly writes columns in national newspapers, which along with his tweets serve as the training data for the bot.

TorfsBot tweets

TorfsBot tweets about five times a day, and randomly chooses the moments to post. The bot works by learning which words usually follow which other words in Rik Torfs' texts, and continuously predicting the next word to form full sentences. Alternatively, the bot also sometimes fills in several new context words into tweets of Rik Torfs. TorfsBot prefers tweeting shorter tweets, since these are are usually more interesting.

TorfsBot also "reads" Flemish news sources, and is thus able to comment on the news just like the real Rik Torfs usually does.

Replies

While real Rik Torfs never replies to tweets, Torfsbot almost always answers every tweet. To reply, TorfsBot generates thousands of possible replies and then picks a reply that contextually fits the conversation, based on keywords used throughout the conversation and the length of the last tweet.

It is also clear that TorfsBot picks up on the words "erger" and "ik" to reply to the tweet commenting on Rik Torfs' rector re-election loss:

Discussions with real Rik Torfs

With a reasonably low probability, TorfsBot will reply to tweets of Rik Torfs, giving rise to interesting (albeit one-sided) "discussions" between the two.

Technical explanation

TorfsBot uses two different text generators at random for generating its tweets. I explain specific details in this paper, this slideshow and this video. TorfsBot's code is also available on GitHub.

Interpolated Markov Chain Model

The first method uses interpolated Markov chain models. It first analyses the given text, in this case all tweets and columns, and counts how often a particular word is used after the two words preceeding it. This way, it knows that whenever a text ends in these two words, which other words can follow it with which probability. For example, it might learn that after the two words "but I", the word "did" is a possible following word. It does the same, but then for which words follow the last three and four words.

The model generates a text by first taking the two starting words of an existing Rik Torfs sentence (either from somewhere in a tweet or column of his). By then continuously looking at the last couple of generated words, it randomly selects one of the possible next words. It prefers selecting words from the Markov chain that looks at more of the previous words, but will sometimes also select from the ones that look at fewer words. This makes sure that the model plagiarizes less text, but still does not sound too random and incoherent.

Schematic overview of how a Markov model that looks back two words decides every word.
Schematic overview of how a Markov model that looks back two words decides every word.

The texts from the Markov chain model are thus usually locally coherent (as all three following words are usually occuring somewhere in the writing of Rik Torfs), but can easily derail, since it is unaware of what it generated more than a couple of words ago.

Dynamic Templates

The second method dynamically replaces keywords from Rik's tweets with words from his columns. It first randomly selects a Rik Torfs tweet, figures out which words are which type of word, and then replaces some of the keywords of the tweet with words having the same type from a text in a Rik Torfs column.

Schematic overview of the Dynamic Template algorithm
Schematic overview of the Dynamic Template algorithm

Fixing up

After generating with either method, and cleans up the text by

  • shortening too long tweets by throwing out sentences in the middle
  • balancing out missing brackets, quote marks
  • replacing irrelevant names with people that appear in today's news
  • updating dates to days that are in the near future

If the final generated text is not too long or unoriginal compared to its source corpus, it is then automatically tweeted on a random schedule.

TorfsBot Or Not?

torfsbotornot

We recently launched TorfsBot Or Not?, a Twitterbot that posts a poll containing either a tweet from Rik Torfs or TorfsBot. It then asks its followers to judge if they think it originally came from Rik Torfs or from TorfsBot. Whenever most people think a TorfsBot tweet was originally from Rik Torfs, it alerts everyone that TorfsBot has passed the Turing test that day.

For more information, see the TorfsBot Or Not? project page or @TorfsBotOrNot on Twitter.

View TorfsBot
Back to projects