As long as humans are competing in strategy-based games, savvy players will study any possible path to solving the puzzle. This process begins when tykes playing tic-tac-toe figure out how to guarantee a draw simply by starting first. Connect Four, checkers, and other simple strategy games come next, with people inevitably refining their play until it approaches perfection.

Mathematicians and game theorists take this natural human inclination to the next level, working to solve strategy games by cracking the code to completely optimal play. This game theory optimal (GTO) system is achieved when players discover behaviors and strategies that produce the best possible outcome for any given situation – regardless of how their opponent responds.

Using nothing but cold logic, deductive reasoning, and plain old common sense, game players and theorists have been able to fully solve dozens of simple games, including Ghost, Hex, and Losing Chess. Assisted by technological advances like computer algorithms and early incarnations of artificial intelligence, humans eventually progressed by attempting to solve the ultimate full-information strategy game – chess.

In 1996, IBM’s chess-playing supercomputer Deep Blue famously lost a series of matches against Grandmaster and world champion Garry Kasparov. But after a series of adaptive adjustments were made to its algorithm, Deep Blue returned with a vengeance the following year, narrowly defeating Kasparov 3.5 to 2.5 in an epic duel.

Deep Blue’s historic victory put chess into the partially solved column, and more than 20 years later, computer scientists are still in the business of solving strategy games. But if chess – which put every piece on the board in plain sight for both players to see – was the perfect starting point to test the limits of artificial intelligence, poker has become the field’s Holy Grail.

Unlike full information games like chess and backgammon, poker variants such as Texas holdem rely on partial information to form their strategic foundations. As the player, you know your own hole cards and the community board cards, but the contents of your opponents’ hand(s) remain unknown. Given 52 possible starting cards, and just seven known items of information – two hole cards coupled with up to five community cards – a holdem player has 2,209 (47 x 47) possibilities to choose from when attempting to deduce another player’s potential holdings.

And when the fluid dynamics of holdem game play are factored in – players can bet with bad hands on a bluff, check with strong hands to trap, and vary their hand selection to confuse unsuspecting opponents – poker immediately stands out as a game that seems to have too much strategy for any computer to truly solve. Throw in the multilayered nature of a game like No Limit Texas holdem, which allows players to precisely size their bets in any amount from the minimum on up, and many observers once believed that poker’s reliance on partial information made the game unsolvable.

In fact, researchers Daphne Koller and Avi Pfeffer affirmed that idea way back in 1995 – one year before Deep Blue burst onto the scene – in a paper entitled “Generating and Solving Imperfect Information Games.” In their report, Koller and Pfeffer made the following observation about the ability of algorithms to decipher the intricacies of partial information games like poker:

“In games such as poker, the players have imperfect information: they have only partial knowledge about the current state of the game.This can result in complex chains of reasoning such as: ‘Since I have two aces showing, but she raised, then she is either bluffing or she has a good hand; but then if I raise a lot, she may realize that I have at least a third ace, so she might fold; so maybe I should underbid, but…

It should be fairly obvious that the standard techniques are inadequate for solving such games: no variant of the minimax algorithm duplicates the type of complex reasoning we just described.”

 

Fast forward to 2018, however, and the research team would likely be awed by the progress made towards solving poker. Twenty-three years is an eternity when it comes to computing capabilities improving, after all, and today scientists are inching ever closer to solving Texas holdem. The first true test of technology’s mettle on the poker table occurred in 2017, when well-known poker pro Phil Laak battled a computer program dubbed “Polaris.”

Phil Laak Splits Two Heads Up Limit holdem Matches Against “Polaris”


At the time, Jonathan Schaeffer – who chairs the University of Alberta’s Department of Computing Science – led the school’s Computer Poker Research Group. In a collaboration with the Association for the Advancement of Artificial Intelligence in Vancouver, Canada, Schaffer and his team developed the Polaris program to play heads up Limit holdem.

Limit holdem was thought to be the perfect entry point for solving poker, as the game boils player actions down to four possible choices – check, bet, raise, or fold. Because the betting in Limit holdem is standardized to a certain limit ($2/$4, $5/$10, etc.), Polaris wasn’t forced to concern itself with evaluating bet sizing, just the four main player actions.

In an event proclaimed to be the first ever “Man vs. Machine Poker Challenge,” Polaris was setup to play Phil Laak – a successful pro of the era better known as “The Unabomber” to fans -alongside fellow pro Ali Eslami. The game was $5/$10 blinds with a $10/$20 betting limit, and the players faced off against Polaris in three 500-hand sessions.

Following an up and down battle between Polaris and the pros, Laak and Eslami wound up winning more money in two of the three sessions to declare victory. Schaffer took the beat in stride, however, pledging to improve Polaris by “teaching” the program how to counter the strategies employed by its human foes. And the researcher also made a bold claim regarding the ability of algorithms to solve heads up Limit holdem one day down the road:

“We’re going to keep working on this. Poker is fun.One of these days – within 5 to 10 years – two-person, limit holdem will be solved.”

 

Heads Up Limit Holdem Declared Solved by “Cepheus” in 2015

 

Schaffer’s timeframe proved to be eerily prescient, as it only took eight years for scientists to achieve true mastery of heads up Limit holdem. A research team consisting of Michael Bowling, Neil Burch, Michael Johanson, and Oskari Tammelin penned an article entitled “Heads-Up Limit holdem Poker is Solved,” which was published in the January 2015 issue of Science.

In the abstract to their final findings, the team declared triumphantly that their new CFR+ algorithm had succeeded in solving heads-up Limit holdem:

“Poker is a family of games that exhibit imperfect information, where players do not have full knowledge of past events. Whereas many perfect-information games have been solved (e.g., Connect Four and checkers), no nontrivial imperfect-information game played competitively by humans has previously been solved.Here, we announce that heads-up limit Texas holdem is now essentially weakly solved.

Furthermore, this computation formally proves the common wisdom that the dealer in the game holds a substantial advantage.” (Quote)

According to a definition first put forth by game theorist V.L. Allis in his 1994 thesis, a game has been weakly solved if “for the initial position(s), a strategy has been determined to obtain at least the game-theoretic value, for both players, under reasonable resources.”The research team followed up on Polaris with a second poker-playing program known as “Cepheus.”

Bowling – a computer scientist at the University of Alberta and Schaffer’s departmental colleague – spoke to tech media outlet The Verge to put Cepheus’ abilities in laymen’s terms:

“We’re not saying that it’s guaranteed to win money on every single hand.What we’re saying is that, in the long run, if you looked at all the hands that could happen and you averaged all of those, then the computer can’t be losing, at a losing rate – it has to be either breaking even or winning.” (Quote)

Cepheus evolved directly from Polaris, as Bowling played a part in developing the first holdem-solving program. In fact, while Polaris was unable to defeat Laak and Eslami in the first challenge, a second attempt one year later saw the computer beat a team of poker pros consisting of  Nick Grudzien, Kyle Hendon, Rich McRoberts, Victor Acosta, Mark Newhouse, Jay Palansky, and Matt Hawrilenko.

In upgrading to Cepheus, Bowling and his team designed a training course of sorts, consisting of 200 computers equipped with 32 GB and 24 central processing units (CPUs) apiece. Over the course of 70 days, Cepheus essentially played itself in a continuous session of heads up Limit holdem.

Following every single decision the computer made, Cepheus was programmed to evaluate the actual outcome, before assigning “regret” points to suboptimal decisions. Over time, the program “learned” which plays worked best, refining itself over and over again until its play approached game theory optimal (GTO) status.

Here’s how Bowling described the Cepheus training sessions while speaking to The Verge:

“We had this training phase where the program started off playing uniform random against itself, [so] it had no idea what it was doing other than following the rules of the game.It [adapts] by thinking of all possible decision points, and every possible action [that could take place from those points. ‘What if I raise here, instead of playing randomly, how much more money or less money would I win?’

We could continue to train it, and it would continue to get better. But we stopped at this point because we can’t tell it apart from being perfect.”

 

Eventually, the solving attempt was expanded to pit Cepheus against human players, but no poker pros were involved. Instead, the researchers created an open portal online through which anybody can challenge Cepheus to a heads up Limit holdem match. Over time, Bowling compiled enough data from these matches to demonstrate that even when Cepheus’ opponents played a true GTO strategy, they could only produce an expected return of 0.000986 big blinds per game.

Owing to the fact that true GTO play was incapable of generating a profit against Cepheus, the research team declared heads up holdem to be effectively solved. You can take your shot at challenging the Cepheus computer in a heads-up Limit holdem duel here at Bowling and Co.’s website.

Pro Player’s Clean “Claudico” Program’s Clock in No Limit Holdem Challenge



By 2015 the focus of poker solving research had moved on to an entirely different beast – the fluid gameplay dynamics of No Limit holdem. To that end, Tuomas Sandholm – a professor of computer science at Carnegie Mellon University – developed his own poker playing algorithm known as “Claudico.” The program was instructed to vary its betting patterns on a random basis, in an attempt to mimic the sizings and strategies employed by human players.

Claudico’s first test came against a four-man team comprised of Doug Polk, Dong Kim, Jason Les, Bjorn Li – poker pros who resided among the world’s top-10 ranked heads up No Limit holdem players at the time. Competing over the course of two weeks, the foursome played 80,000 hands against Claudico, with a combined $170 million in “chips” on the virtual tables.

In the end, Li accumulated profits of $529,033 against Claudico,while Polk won $213,671 and Kim ended $70,491 in the black. Only Les lost to the program, winding up down $80,482. All told, the pros finished the marathon match up $732,713, a result which ostensibly gave mankind a victory over the machines.

But according to Sandholm, the six-figure upswing for Polk and crew wasn’t really a win at all, given the fact that $732,713 represented just 0.431 percent of the total chips in play.In statistical terms at least, that margin of separation wasn’t enough to achieve true victory, so Sandholm declared the match to be a tie in his final research report:

“We knew Claudico was the strongest computer poker program in the world, but we had no idea before this competition how it would fare against four Top 10 poker players.It would have been no shame for Claudico to lose to a set of such talented pros, so even pulling off a statistical tie with them is a tremendous achievement.” (Quote)

In the wake of their win – or statistical stalemate as Sandholm would describe the outcome – Polk and his fellow pros expressed an equal mix of admiration and ambivalence toward Claudico’s skills. According to Polk, the program’s randomized betting patterns did indeed make the match more difficult, but only because Claudico was making moves no human would ever dare.

Polk described a situation wherein most human players might bet between 50 percent and 75 percent of the pot amount, while Claudico would vary his wagers between 10 percent and 1,000 percent:

“There are spots where it plays well and others where I just don’t understand it.Betting $19,000 to win a $700 pot just isn’t something that a person would do.”

Sandholm was undeterred by the setback, however, and he vowed to refine Claudico’s algorithm until a genuine simulation of human betting could be achieved.

“Libratus” Program Finally Beats Humans at Heads Up No Limit Holdem


That pledge came to fruition in 2017, when Sandholm debuted an upgraded heads up No Limit holdem program known as “Libratus.” Working in conjunction with PhD student Noam Brown, Sandholm sought to teach Libratus the secret to successful poker play when betting has no limits – bluffing.

Sandholm described his efforts in an interview with The Telegraph:

“The computer can’t win at poker if it can’t bluff.Developing an AI that can do that successfully is a tremendous step forward scientifically and has numerous applications.

Imagine that your smartphone will someday be able to negotiate the best price on a new car for you. That’s just the beginning.” (Quote)

A second series of matches was scheduled against four heads up No Limit holdem specialists, with Les and Kim returning alongside Daniel McAulay and Jimmy Chou. This time, the match was expanded to 120,000 hands, or nearly double the amount Claudico put in, in order to achieve statistically relevant results.

The challenge took place over 20 days, and Libratus took an immediate lead which it would never relinquish. When nearly three weeks had elapsed, Kim finished down $85,649, MacAulay was in the red $277,657, Chou lost $522,857, and Les saw $880,087 of his virtual chips claimed by Libratus.

Overall, the program beat the four pros out of $1,766,250, and with the big blind set at $100, the win rate of 14.7 big blinds per 100 hands played was deemed to be highly significant from a statistical perspective.

In the wake of Libratus’ resounding victory, Sandholm proudly trumpeted the solving of heads up No Limit holdem to media outlets like The Telegraph:

“The best AI’s ability to do strategic reasoning with imperfect information has now surpassed that of the best humans.”

 

Kim was humble in defeat, telling tech magazine Wired that Libratus was simply too talented for even skilled human pros to outmaneuver:

“It was about half way through the challenge (with Libratus when) I knew we wouldn’t come back.It had less bugs in the algorithm. We just ran over Claudico, bluffed it everywhere, but this time I felt like it was the other way around.

I didn’t realize how good it was until today. I felt like I was playing against someone who was cheating, like it could see my cards. I’m not accusing it of cheating. It was just that good.” (Quote)

And just like that, another frontier on the poker-solving front had been conquered by humans – with a helping hand from their computer algorithms.

True Holdem Solving Confined to Heads Up Games… For Now


While the domain of heads up holdem appears to have been solved, researchers like Bowling freely admit that their technology has its limits. Specifically, any iteration of holdem that involves more than two players.

When the gameplay dynamics involved in even three-handed poker are factored in, computers are as yet unable to adapt as they do in a strictly one-on-one confrontation. This is due to many variables, of course, but as Bowling observed when speaking to The Verge about Cepheus, the concept of cooperative gameplay lies at the heart of the issue:

“There’s no strategy in a three-player game that can guarantee that it doesn’t lose because it’s actually possible that the other two players in the game might gang up on it.Collusion is illegal in a competitive game, but it’s hard to quantify what that actually means.

We just can’t say as much about whether it produces optimal strategies.”

 

In other words, when three or more poker players compete for pots, simply winning each hand isn’t necessarily the goal. One player may decide to let a particular hand go when facing two opponents, as the odds inherent to holdem change when facing more than one hand. Similarly, the tournament construct sets up a series of decisions in which cooperative gameplay can be used to achieve benefits imperceptible to a computer algorithm – such as checking down in a three-way pot to ensure a short stack is eliminated.

But just as Sandholm predicted heads up Limit holdem would eventually be solved within a decade’s reach, most researchers currently believe it’s only a matter of time before the No Limit game is solved as well.

Conclusion

The notion of solving a complex game like Texas holdem is fascinating, both for poker pros looking for an edge, and computer scientists interested in expanding the capabilities of artificial intelligence. Tic-tac-toe and checkers may be one thing, but teaching a computer to play poker – and more importantly, to beat humans at their own game – portends a new age in the relationship between men and their machines.

Related Articles
0 Comments
Leave Your Comment

Your email address will not be published. Required fields are marked *