AI in Winter Palace - Koveras' Korner

In the previous post, I have discussed the various AI systems in Ghul. This is a similarly deep dive into the behavior-governing systems of my other AI-intense game, Winter Palace (a.k.a. "my Love Letter clone").

To start off, unlike Ghul, which was conceived and executed as an art game, the development of Winter Palace was driven by a desire to create an AI system that could capably play Love Letter. At that point, I was much better versed in the basics of video game AI and wanted to test my own skills at it – more so than to create an engaging game. It was only after I realized that my AI can actually carry an entire game that I have considered slapping some original assets onto it and publishing it online. Because said assets have been an afterthought, this text will refer to individual cards by their original LL names, rather than their WP names.

Also unlike Ghul, which had some ambiguity regarding its monster AI's cognitive frame, the AIs in Winter Palace (nicknamed internally Athos, Porthos, and Aramis; the player character is, of course, D'Artagnan) very clearly operate in the ludic frame and are out to beat the player, putting them into the Rival AI category. The only concession where my AI is not playing to win is that it will never target the human player until the latter has made at least one move, because me getting randomly knocked out before I can do anything is both frustrating and makes it difficult to playtest.

Controllers

Since the game was originally intended as a test bed for various AI architectures, I have written abstract player controller and perceptor interfaces that can be instantiated with different implementations of the same basic functionality. In case of controllers, this functionality is selecting one of legal moves on your turn and carrying it out. The UI controller used by the player extends the same abstract player controller class.

The knowledge of which moves are legal, given the two cards in your hand and the situation at the table, is provided by the game: specifically, each card class contains a method returning all legal moves involving it and another one for resolving them. Therefore, the AI controller's task is to evaluate these moves and to call the resolution method for the move it has selected. In ascending order of complexity and effectiveness, I have implemented following controllers:

Random: This controller picks one of the legal moves independently and (almost) uniformly at random. While my original intention was to actually make it as stupid (at decision-making) as possible, I found that AI opponents knocking themselves out by playing the Princess or not playing the Countess when they have a Prince or a King made the game excessively boring, so I have built in a check to specifically prevent those two suicidal moves.
Not stupid: While the random controller wasn't suicidal already, this one's objective is to also make no moves that would look stupid to a human player. To do so, it categorizes all of its legal moves into either stupid or not-stupid and picks a uniformly random non-stupid one if any are available, resorting to a random stupid one otherwise. Stupid moves include: targeting a player protected by a Handmaid, not playing a card from your hand after another player has seen it, playing the higher-value card in the endgame, playing a Guard and guessing a card when all of its value have already been played, etc. All of these cases have been observed by me when playing against random controllers and hard-coded into the evaluation function of the not-stupid controller. Note also that unlike the random controller, this one processes non-trivial information about the game (such as the contents of the discard pile and of others' hands that it knows, if any).
Best utility: This controller is a true dual-utility system in the vein of Dill and Mark in that it computes two scores for each legal move, its utility and its rank (see below), then selects the one with the highest utility within the highest rank containing utilities higher than zero. Because this system considers all legal moves and picks the one it deems the best, even when it realizes it is stupid, it displays an algorithmic stability akin to that of fuzzy control systems.
- A move's rank represents its priority: from "never-ever" (reserved for suicidal moves only) and barely sensible (for "stupid" moves), through only-if-necessary (low priority), normal, and paramount (high priority), to must-play (reserved for moves required to stay in the game). Rank is a more differentiated evolution of stupid/not-stupid categories. Naturally, several moves can have the same rank/priority.
- A move's utility represents its usefulness and is a weighted sum of several considerations: the specific effectiveness of the particular move, the usefulness of the card the player keeps for the next turn, as well as whether it has been seen by another player, the danger that the target poses (a fusion of whether the target knows the AI's hand, how many players they have knocked out, and their overall score), as well as the AI's "grudge" against the target (how often they have targeted the AI). The data for these considerations comes from the perceptor system, while the logic for computing individual cards' utility is offloaded into their own classes (similarly to the methods for listing legal moves and their resolution). Note also that how often a player has targeted the AI, how many others they have knocked out, and their score are all variables that persist across game rounds, potentially creating an illusion of long-term personal rivalry between AI players.
Randomized utility: Because the best utility controller is fully deterministic and the game it plays is not, it can end up consistently picking suboptimal moves (which is undesirable in a Rival AI) due to uncertainties stemming from two origins: bad game state estimation by the perceptor and suboptimal utility computation rules by the developer. General AI research has shown, however, that simply randomizing (within sensible bounds) the decision-making process is an effective way of dealing with such input uncertainty. Therefore, the randomized utility controller calculates the utility for each legal move, then considers the top options within the highest rank, and picks one of them at random, weighted by their respective utility. This controller has consistently achieved the best scores in the game, when compared to other controllers using the same perceptors.

I have also had plans to create a "yomi controller" and a game-theoretic controller, but decided not to go ahead with them, because utility systems have already shown a mostly satisfying performance. A yomi controller (named after the concept of "yomi" popularized by David Sirlin) would have not only considered the immediate effects of its moves on the game state, but would also estimate other players' beliefs about its own hand and the effect its moves would have on said beliefs, effectively playing not just the game but the other players, too. The game-theoretic controller would have calculated a tree of all possible outcomes and other players' moves and picked the move that maximized its chances of winning the round. Needless to say, both of them would have been extremely complex without necessarily showing a clear advantage over the utility systems.

Perceptors

Evidently, the effectiveness of an AI controller in an incomplete information game like Love Letter is limited by how well it can estimate the value of the cards it cannot currently observe. To this end, I have written an abstract "perceptor" class, whose implementations not only perceive the visible game state (such as moves, their resolutions, and the discard pile), but also estimate its hidden parts. The two main methods of a perceptor class return a) the probability that a given player holds a specific card (a discrete distribution) and b) the probability that a particular card will be drawn from the deck next (ditto).

Like with the controllers, several "perceptor" systems have been programmed, in ascending order of complexity and effectiveness:

Priori: This primitive "perceptor" doesn't pay attention to the game at all, always returning the starting card distribution of a full deck (i.e. it will always return a 5/14 probability that another player has a Guard, even when all Guards are in the discard pile).
Naive posteriori: All posteriori perceptors observe the turns they and other players make and update their estimation of others' hands and of the deck. However, the naive posteriori perceptor differs from the priori only in that it accounts for which cards it has already observed in its hand, in the discard pile, or by peeking into another's hand. This results in a somewhat better estimation but is still ignorant of more nuanced aspects of the game.
Simple posteriori: This and all later posteriori perceptors maintain and update separate probability distributions for each other player's hand (that is still in play) and the deck. However, the simple perceptor differs from the naive one mainly in additionally accounting for the effects of certain cards like the Priest, Baron, Prince, and King on its own knowledge of the game state.
Full and advanced posteriori: These two perceptors effectively do the same thing, with the "full" being the first, simpler, and faultier iteration, and the "advanced" being the most mathematically correct (although still simplified) estimator. They work by updating hand and deck distributions in several steps:
1. Updating the hand distribution of the player who went last, based on which card they played. Essentially, this tries to estimate what card the player had kept in their hand, given the distribution of what they held before, the deck distribution, the specific card they played, and a matrix of likelihoods of playing card x when you have cards x and y in your hand. This matrix accounts for impossibility of situations like "playing a King and keeping a King" (since there is only one King card) and "playing a King while keeping a Countess" (since it would automatically knock the player out of the round), with the less trivial likelihoods, like that of playing a Baron when you also have a Prince, having been determined empirically (see below).
2. Optionally accounting for the card that has been discarded by any player in addition to the one that has been played, e.g. upon a knockout or when a Prince has been played.
3. Analyzing the card-specific outcomes of the turn. E.g. when a player has played a Baron, but neither they, nor their target have been knocked out, it is obvious that both have the same card and their hand distributions should be updated accordingly.
4. Update the card distribution in the deck with the information from the previous steps. Especially in the advanced perceptor, this step is extremely complex, involving four nested loops, but the estimation accuracy it displays is hard to argue against.

Unsurprisingly, the advanced posteriori perceptor shows the best performance overall – in fact, its ability to accurately guess which cards other players hold is downright spooky and greatly outpaces human deduction capacity (as well as AI controllers' ability to bluff). It is paired with the randomized utility controller for the "brainy" difficulty mode, showing impressive competence at the game. The one edge case that it currently cannot elegantly handle is a scenario wherein a Baron play results in a draw and the player then learns the value of the hand of one of the players involved (e.g. with a Priest): the perceptor cannot extend this knowledge to the other player's hand, because individual hand distributions are considered as separate arrays. The current workaround is to explicitly link the two distributions, at least until one of the players makes another move. The mathematically correct solution would be instead to maintain a joint distribution of all possible hidden hands and to slice/project it as necessary, but implementing multidimensional matrix operations in C# has so far proven complicated, which is why I still haven't finished it.

Update 01.01.2020: I have finally found the time and motivation to finish the joint distribution solution I started in August 2018. After reimplementing the entire matrix manipulation logic from scratch and fixing quite a few errors in my reasoning, I have managed to reduce the RMSE of opponent hand estimations (relative to the actual opponent hands) by 6% and cross entropy, by a whopping 12%, compared to the advanced posteriori perceptor. Another thing that considering each individual game state allowed me to do was to implement a yomi-like reasoning regarding how sensible an observed move would have been given the assumed hand distributions, which I will have to expand upon in future versions of the game. The new "extended posteriori perceptor" (I keep killing it with creative names) is the one used in the brainy mode as of version 1.1.0, now available for download.

As an aside, a mathematically perfect estimator would consider all possible game states, i.e. permutations of the 13 cards (including the one card always set aside, but excluding the card drawn by the player at the start of the game), and eliminate them one by one as the game goes on. Given how this would require it to keep track of 13! > 6.2 billion possible game states, such a system would be prohibitively expensive without very efficient data processing. Still, a game-theoretical controller would require an equivalent amount of space and computational power, so I might return to the idea some day.

Adversarial Monte Carlo Learning

Previously, I have mentioned the matrix containing conditional probabilities of which card the player has kept in their hand, given that they've played the card that has been played. Because these likelihoods are derived from the extremely non-linear and subject-to-change utility functions, I didn't want to estimate them by hand. Instead, I have taken the easy way out and let the AI play over 600 rounds against itself, recording which cards they played and which they kept (a process I pretentiously dub "adversarial Monte Carlo learning"). Afterwards, I have statistically analyzed the aggregate results, manually confirmed their plausibility, and hard-coded the resultant matrix into the AI code. This method has several flaws, such as:

The matrix is only valid for a specific combination of a perceptor and a controller, but not for other AI components or for human players.
In case of a significant change in the utility computation of even a single card, the entire matrix needs to be re-estimated by playing another 600 rounds.
The matrix does not account for shifting player priorities over time, e.g. that the marginal utility of keeping high-value cards is greater towards the end of a round than at the beginning.

However, the appeal of reducing the extreme complexity of reverse-engineering utility computations to just running a bunch of games and analyzing the results is a fair trade-off in my experience and opinion, especially since the results are quite impressive (in fact, I strongly believe that the matrix is a large part of the AI's spooky ability to guess cards correctly).

: Published: 08 November 2019; Last Updated: 01 January 2020

Artificial intelligence