Wordle: Finding the Right Words to Say

13 minute read

Published:

The Wordle trend has grabbed the attention of an omicron laden society stumbling their way into a new year. It is a fun word game by Josh Wardle where each day a new word is released for people to guess. The daily release schedule, minified design, and simple game mechanics have made the game a viral hit.

On being shown the game the first time I was naturally curious as to what the best guessing strategy is. So I did a bit of thinking and some research to arrive at a couple good solutions. These are outlined and compared below alongside some other interesting information about the game.

How to Play

The objective of the game is to guess a secret 5 letter word. Upon guessing a word the player is informed for each letter whether it is correct, misplaced, or incorrect. Correct means that the corresponding letter is in the secret word in the same position, while incorrect means that letter is not in the secret word at all. Misplaced means the letter is in the secret word at least once, but is in the wrong position.

For example, if the secret word is “dance” and the player guesses “later”, then the result would be INCORRECT, CORRECT, INCORRECT, MISPLACED, INCORRECT. The player wins if they can get 5 correct letters (guess the word) in 6 turns, otherwise they lose the game.

Nostalgic board game players will recall that this is the same as the popular 70s game Mastermind using letters instead of colored pegs. In Mastermind one player chooses a secret code of 4 colored pegs with 6 possible colors. Following the same rules outlined above the other player tries to guess the secret code. Mastermind itself is actually an adaptation of an older game called Bulls and Cows.

Some Interesting Properties

If we assume that all \(26^{5}\) letter combinations are possible codewords, then Wordle is equivalent to Mastermind with 26 colors and length 5 codes. It has been shown that Mastermind is NP-complete in the length of the code [1].

Best strategies?

When selecting a strategy our main concerns are the maximum number of guesses and the likelihood of winning. If the maximum number of possible guesses is above 6, then it is possible for that strategy to lose. When it is possible for a strategy to lose, then we care about how often it will win. The best strategy would never lose, but if that is not feasible, then it is desirable to find the one which wins most often.

With these objectives in mind there are several ways to approach choosing the next guess. You can try to find the most likely next word, the word which will remove the most possibilities from the word list, etc… I describe some here which I found to be quite successful. In each I make the assumption that you know the full list of possible words \(\mathcal{W}\). This is a reasonable assumption, since in the worst case you can use all 5 letter words in the english dictionary as \(\mathcal{W}\).

Random Guessing

The most obvious strategy and a control for the others is random selection. Each turn the player randomly selects a word from \(\mathcal{W}\) as their guess.

If the player randomly selects without replacement and does not remove impossible words after each guess, then the expected number of guesses is \(\lvert\mathcal{W}\rvert /2\) and the maximum is \(\lvert\mathcal{W}\rvert\). This is a fairly poor strategy as you have a \(6/ \lvert\mathcal{W}\rvert\) chance of winning. However, it can be greatly improved by removing impossible words from \(\mathcal{W}\) after each guess as this drastically reduces the number of words to select from. This is also similar to how people play the game. They do not continue guessing words they know will not work.

Minimax

Normally in game theory a random strategy can be improved by using a minimax selection criteria. This entails performing the action which minimizes the worst-case scenario. In this game the worst-case scenario is guessing a word which removes the fewest number of possible remaining guesses. Therefore, we want to guess a word that minimizes the maximum the number of possible words left. That is

\[w^* = \argmin_{w} \max_{w'} \ell(w,w')\]

where \(w, w' \in \mathcal{W}\) and \(\ell(g,s)\) is the number of words left in \(\mathcal{W}\) if you guess \(g\) and \(s\) is the secret. In this strategy you make the minimax guess, update the list of possible words \(\mathcal{W}\), and repeat until you get it correct.

Greedy Probabilistic Guessing

Another approach would be to choose the most probable word. There is more than one way to meaningfully define how likely a word is to be the secret. In this case we choose the word with the most likely letters in each position. It is fairly straightforward to define probability here based on frequency in the word list. This local optimality is what makes the method greedy.

However, just choosing the most likely letter for each position will sometimes produce invalid words. So we can sort the letters based on probability and run down the list until a valid word is found. The first valid word will be our guess.

Genetic Algorithm

So far these approaches have all been deconstructive. They attempt to deduce the best word from a large list of possible words. We can also try a constructive approach where we learn how to generate good guesses. A genetic algorithm is great for this. The idea of a genetic algorithm is to continually evolve and mutate a population, while imposing natural selection until the members reaches some objective.

We begin with a population of words \(P\) such that \(P \subseteq \mathcal{W}\). Then we perform selection to find the most fit members of \(P\) and “breed” them using crossover. Finally, members are subject to mutation, inversion, and permutation. After some fixed number of generations the most fit word is used as a guess.

There is a lot to unpack here, so let me explain in a bit more detail. First, is the fitness function. We need some objective to compare words in \(P\) and determine which are better. For this we can modify a metric from [2]. That fitness function is

\[\textrm{fitness}(w) = -\sum_{i=1}^{\# guesses} \left( \lvert \textrm{correct}(g_i, w) - c_i \rvert + \lvert \textrm{misplaced}(g_i, w) - m_i \rvert \right)\]

where \(g_i\) is the guess from turn \(i\), \(c_i\) is the number correct from turn \(i\), \(m_i\) is the number misplaced from turn \(i\), \(\textrm{correct}(g,s)\) is the number of correct letters for guess \(g\) and secret \(s\), and likewise for \(\textrm{misplaced}(g,s)\). This works by assuming \(w\) is the secret word and measuring how well that assumption lines up with our observations so far.

Next you need to simulate natural selection on \(P\) to determine which words die out or have the opportunity to repopulate. There are many good ways to do selection [3]. Here we use tournament selection, where a random subset of \(P\) is selected and a tournament is held to decide who gets to move on. The highest fitness value determines the tournament winner.

Once selection has occurred crossover takes place. During this phase random pairs are selected as parents and with some probability \(p_{\textrm{crossover}}\) they perform crossover. This action involves choosing a random pivot in the word and splitting each parent at that pivot. The parents have 2 children which are made from joining different parts from each parent. For example, if “stage” and “slate” are the two parents, then one pivot might lead to “slage” and “state” as children. These children will move on to the next generation.

Finally, each child might mutate with probability \(p_{\textrm{mutate}}\), invert with probability \(p_{\textrm{inversion}}\), and permute with probability \(p_{\textrm{permutation}}\). These are each straightforward random changes to the word and are typically set up to happen with low probability (i.e. 3% chance).

This process happens a fixed number of generations \(N\). The word with the highest fitness after the last generation is selected as the guess.

Comparing the Strategies

Each of the above strategies I implemented here and simulated 5000 games with to measure their performances. Since the genetic algorithm has many different parameters I used Tree Parzen Estimation [4] over 200 iterations to find a good parameter set. The results are presented below.

The genetic algorithm wins in the average case with the lowest average and highest percentage of games won. It also ends up being the slowest, but at ~0.15 seconds per game it is still a feasible solution. Minimax and the probabilistic approaches also work very well winning ~98% of their games. Unfortunately, none of them guarantee a win. They all had between 1 and 2 percent of games take over 6 turns, but this is still an acceptable win rate.

What was most interesting to me is how well the random approach worked. If you know the list of possible solutions \(\mathcal{W}\) and you continually guess random words from \(\mathcal{W}\) removing impossible ones each time, then you will still win ~92% of the time in 5 guesses on average. It turns out it is entirely realistic to know the set of words for this game as I discuss below. And since the other strategies take ~4 guesses on average this shows that using a strong guessing strategy over a random one will only save you 1 guess in the average case.

Playing Unfair

These strategies are all great, but they ignore an important fact: Wordle is an entirely browser based game. There is no session, account, or server sending new words to you. This means the code for the entirety of the game is stored in your browser while you play it. You can view the source, albeit it has been minified making it difficult to decipher.

Knowing the Right Words

Even with the minified JS code it is easy to see there are two ginormous word lists. One with ~2000 words, \(\mathcal{W}^{(1)}\), and the other with ~10,000, \(\mathcal{W}^{(2)}\). Upon inspection it becomes clear that \(\mathcal{W}^{(1)}\) is the list of possible secret words. Furthermore, \(\mathcal{W}^{(1)} \cup \mathcal{W}^{(2)}\) is the set of words the game will allow you to guess.

Knowing the list of possible secret words allows us to make better guesses by choosing exclusively from this list. This is how the results above were computed. Additionally, since \(\lvert\mathcal{W}^{(1)}\rvert \approx 2000\) this list is much smaller than the list of all 5 letter english words. Even a list of only common english words has ~16,000 entries with 5 letters.

Knowing THE Word

If you look even further for patterns in these lists you will find that the secret word is selected based on the number of days since the game’s release. If it has been \(i\) days since June 19th, 2021, then the secret word will be the \(i\)-th element of \(\mathcal{W}^{(1)}\). This makes the game trivial as you know what the secret word will be for any given day.

You can also simply look at the local memory store in your browser where a game state object stores the secret word in plain text. However, neither of these are foolproof as the developers could change how the secret word is selected and stored whenever they want.

Extra: Best First Guess

Looking through the source code for the secret word is pretty much cheating and takes away the fun of the game. Also using a computer to make guesses is an interesting coding and game theory project, but is, alas, incredibly boring. One hint we can get from computers though is what the best first guess might be.

As with the solvers you can approach this in several different manners. The best guess might be the one which has the most frequent letters or it could be the word which satisfies the minimax criteria. Looking exclusively at \(\mathcal{W}^{(1)}\) I have found “slate” to be the best first guess for each of my algorithms. “cares” and “soare” (the latter which this article suggests) are also good first guesses. They are not in the list of possible secret words, but they remove a lot of options.

If we select words strictly from \(\mathcal{W}^{(1)}\) “raise” removes 2078.5 possible words from \(\mathcal{W}^{(1)}\) on average. “cigar” removes the most with 2314 words removed when “sugar” is the key.

If we can choose any word from \(\mathcal{W}^{(1)} \cup \mathcal{W}^{(2)}\), then the highest average is “soare” with 2095.479 removed words on average. “cigar” removes the most with 2314.0 words removed when the key is “sugar”.