Developing a chess program able to play in a chess server will be my next goal for 2016
so I better start piling tons of info for my own study online, who ever needs to this information is welcome
so I better start piling tons of info for my own study online, who ever needs to this information is welcome
Chess Programming Part I: Getting Started
By François Dominic Laramée | Published May 17 2000 01:30 PM
in Artificial Intelligence
This is the first article in a six-part series about
programming computers to play chess, and by extension other similar strategy
games of perfect information.
Chess has been described as the Drosophila Melanogaster of
artificial intelligence, in the sense that the game has spawned a great deal of
successful research (including a match victory against the current world
champion and arguably the best player of all time, Gary Kasparov), much like
many of the discoveries in genetics over the years have been made by scientists
studying the tiny fruit fly. This article series will describe some of the
state-of-the-art techniques employed by the most successful programs in the
world, including Deep Blue.
Note that by the time the series is completed (in October),
I will have written a simple implementation of the game in Java, and the source
code will be freely available for download on my web site. So if you want to
see more code samples, be patient; I'll give you plenty in due time!
Games of Perfect Information
Chess is defined as a game of "perfect
information", because both players are aware of the entire state of the
game world at all times: just by looking at the board, you can see which pieces
are alive and where they are located. Checkers, Go, Go-Moku, Backgammon and
Othello are other members of the category, but stud poker is not (you don't
know what cards your opponent is holding in his hands).
Most of the techniques described in this series will apply
more or less equally to all games of perfect information, although the details
will vary from game to game. Obviously, while a search algorithm is a search
algorithm no matter what the domain, move generation and position evaluation
will depend completely on the rules of the game being played!
What We Need
In order to play chess, a computer needs a certain number of
software components. At the very least, these include:
• Some way
to represent a chess board in memory, so that it knows what the state of the
game is.
• Rules to
determine how to generate legal moves, so that it can play without cheating
(and verify that its human opponent is not trying to pull a fast one on it!)
• A
technique to choose the move to make amongst all legal possibilities, so that
it can choose a move instead of being forced to pick one at random.
• A way to
compare moves and positions, so that it makes intelligent choices.
• Some sort
of user interface.
This series will cover all of the above, except the user
interface, which is essentially a 2D game like any other. The rest of this
article describes the major issues related to each component and introduces
some of the concepts to be explored in the series.
Board Representations
In the early days of chess programming, memory was extremely
limited (some programs ran in 8K or less) and the simplest, least expensive
representations were the most effective. A typical chessboard was implemented
as an 8x8 array, with each square represented by a single byte: an empty square
was allocated value 0, a black king could be represented by the number 1, etc.
When chess programmers started working on 64-bit
workstations and mainframes, more elaborate board representations based on
"bitboards" appeared. Apparently invented in the Soviet Union in the
late 1960's, the bit board is a 64-bit word containing information about one
aspect of the game state, at a rate of 1 bit per square. For example, a
bitboard might contain "the set of squares occupied by black pawns",
another "the set of squares to which a queen on e3 can move", and
another, "the set of white pieces currently attacked by black
knights". Bitboards are versatile and allow fast processing, because many
operations that are repeated very often in the course of a chess game can be
implemented as 1-cycle logic operations on bitboards.
Part II of this series covers board representations in
detail.
Move Generation
The rules of the game determine which moves (if any) the
side to play is allowed to make. In some games, it is easy to look at the board
and determine the legal moves: for example, in tic-tac-toe, any empty square is
a legal move. For chess, however, things are more complicated: each piece has
its own movement rules, pawns capture diagonally and move along a file, it is
illegal to leave a king in check, and the "en passant" captures, pawn
promotions and castling moves require very specific conditions to be legal.
In fact, it turns out that move generation is one of the
most computationally expensive and complicated aspects of chess programming.
Fortunately, the rules of the game allow quite a bit of pre-processing, and I
will describe a set of data structures which can speed up move generation
significantly.
Part III of this series covers this topic.
Search Techniques
To a computer, it is far from obvious which of many legal
moves are "good" and which are "bad". The best way to
discriminate between the two is to look at their consequences (i.e., search
series of moves, say 4 for each side and look at the results.) And to make sure
that we make as few mistakes as possible, we will assume that the opponent is
just as good as we are. This is the basic principle underlying the minimax
search algorithm, which is at the root of all chess programs.
Unfortunately, minimax' complexity is O(bn), where b
("branching factor") is the number of legal moves available on
average at any given time and n (the depth) is the number of "plies"
you look ahead, where one ply is one move by one side. This number grows impossibly
fast, so a considerable amount of work has been done to develop algorithms that
minimize the effort expended on search for a given depth. Iterative-deepening
Alphabeta, NegaScout and MTD(f) are among the most successful of these
algorithms, and they will be described in Part IV, along with the data
structures and heuristics which make strong play possible, such as
transposition tables and the history/killer heuristic.
Another major source of headaches for chess programmers is
the "horizon effect", first described by Hans Berliner. Suppose that
your program searches to a depth of 8-ply, and that it discovers to its horror
that the opponent will capture its queen at ply 6. Left to its own devices, the
program will then proceed to throw its bishops to the wolves so that it will
delay the queen capture to ply 10, which it can't see because its search ends
at ply 8. From the program's point of view, the queen is "saved",
because the capture is no longer visible... But it has lost a bishop, and the
queen capture reappears during the next move's search. It turns out that
finding a position where a program can reason correctly about the relative
strength of the forces in presence is not a trivial task at all, and that
searching every line of play to the same depth is tantamount to suicide.
Numerous techniques have been developed to defeat the horizon effect;
quiescence search and Deep Blue's singular extensions are among the topics
covered in Part V on advanced search.
Evaluation
Finally, the program must have some way of assessing whether
a given position means that it is ahead or that it has lost the game. This
evaluation depends heavily upon the rules of the game: while "material
balance" (i.e., the number and value of the pieces on the board) is the
dominant factor in chess, because being ahead by as little as a single pawn can
often guarantee a victory for a strong player, it is of no significance in
Go-Moku and downright misleading in Othello, where you are often better off
with fewer pieces on the board until the very last moment.
Developing a useful evaluation function is a difficult and
sometimes frustrating task. Part VI of this series covers the efforts made in
that area by the developers of some of the most successful chess programs of
all time, including Chess 4.5, Cray Blitz and Belle.
Conclusion
Now that we know which pieces we will need to complete the
puzzle, it is time to get started on that first corner. Next month, I will
describe the most popular techniques used to represent chess boards in current
games. See you there!
François Dominic Laramée, April 2000
Chess Programming Part II: Data Structures
Last month, I presented the major building blocks required
to write a program to play chess, or any other two-player game of perfect
information. Today, I will discuss in a bit more detail the most fundamental of
these building blocks: the internal representation of the game board.
You may be surprised to notice that, for all intents and
purposes, the state of the art in this area has not changed in thirty years.
This is due to a combination of ingenuity (i.e., smart people made smart
choices very early in the field's history) and necessity (i.e., good data
structures were a pre-requisite to everything else, and without these effective
techniques, not much would have been achieved.)
While we're at it, I will also present three support data
structures which, although not absolutely required to make the computer play,
are invaluable if you want it to play well. Of these, two (one of which
consumes ungodly amounts of memory) are designed to accelerate search through
the game tree, while the third is used to speed up move generation.
Before we go any further, a word to the wise: in chess as in
any other game, the simplest data structure that will get the job done is
usually the one you should use. While chess programmers have developed numerous
clever data representation tricks to make their programs go faster, very simple
stuff is quite sufficient in many other games. If you are a novice working on a
game for which there is limited literature, start with something easy,
encapsulate it well, and you can experiment with more advanced representations
once your program works.
Basic Board Representations
Back in the 1970's, personal computer memory was at a
premium (that's an understatement if there ever was one!), so the most compact
the board representations, the better. There is a lot to be said for the most
self-evident scheme: a 64-byte array, where each byte represents a single
square on the board and contains an integer constant representing the piece
located in that square. (Any chess board data structure also needs a few bytes
of storage to track down en passant pawn capture opportunities and castling
privileges, but we'll ignore that for now, since this is usually implemented
separately, and pretty much always the same way.)
A few refinements on this technique soon became popular:
• The
original SARGON extended the 64-byte array by surrounding it with two layers of
"bogus squares" containing sentinel values marking the squares as
illegal. This trick accelerated move generation: for example, a bishop would
generate moves by sliding one square at a time until it reached an illegal
square, then stop. No need for complicated a priori computations to make sure
that a move would not take a piece out of the memory area associated with the
board. The second layer of fake squares is required by knight moves: for
example, a knight on a corner square might try to jump out of bounds by two
columns, so a single protection layer would be no protection at all!
• MYCHESS
reversed the process and represented the board in only 32 bytes, each of which
was associated with a single piece (i.e., the white king, the black King's
Knight's pawn, etc.) and contained the number of the square where that piece
was located, or a sentinel value if the piece had been captured. This technique
had a serious drawback: it was impossible to promote a pawn to a piece which
had not already been captured. Later versions of the program fixed this
problem.
Today, this type of super-miserly structure would only be
useful (maybe) if you wrote a chess program for a Palm Pilot, a cell phone or a
set-top box where 80-90% of resources are consumed by the operating system and
non-game services. However, for some other types of games, there is really no
alternative!
For more information on vintage chess programs, read David
Welsh's book, Computer Chess, published in 1984.
Bit Boards
For many games, it is hard to imagine better representations
than the simple one-square, one-slot array. However, for chess, checkers and
other games played on a 64-square board, a clever trick was developed
(apparently by the KAISSA team in the Soviet Union) in the late 60's: the bit
board.
KAISSA ran on a mainframe equipped with a 64-bit processor.
Now, 64 happens to be the number of squares on a chess board, so it was
possible to use a single memory word to represent a yes-or-no or true-or-false
predicate for the whole board. For example, one bitboard might contain the
answer to "Is there a white piece here?" for each square of the
board.
Therefore, the state of a chess game could be completely
represented by 12 bitboards: one each for the presence of white pawns, white
rooks, black pawns, etc. Adding two bitboards for "all white pieces"
and "all black pieces" might accelerate further computations. You
might also want to hold a database of bitboards representing the squares
attacked by a certain piece on a certain square, etc.; these constants come in
handy at move generation time.
The main justification for bit boards is that a lot of
useful operations can be performed using the processor's instruction set's
1-cycle logical operators. For example, suppose you need to verify whether the
white queen is checking the black king. With a simple square-array
representation, you would need to:
• Find the
queen's position, which requires a linear search of the array and may take 64
load-test cycles.
• Examine
the squares to which it is able to move, in all eight directions, until you
either find the king or run out of possible moves.
This process is always time-consuming, more so when the
queen happens to be located near the end of the array, and even more so when
there is no check to be found, which is almost always the case!
With a bitboard representation, you would:
• Load the
"white queen position" bitboard.
• Use it to
index the database of bitboards representing squares attacked by queens.
• Logical-AND
that bitboard with the one for "black king position".
If the result is non-zero, then the white queen is checking
the black king. Assuming that the attack bitboard database is in cache memory,
the entire operation has consumed 3-4 clock cycles!
Another example: if you need to generate the moves of the
white knights currently on the board, just find the attack bitboards associated
with the positions occupied by the knights and AND them with the logical
complement of the bitboard representing "all squares occupied by white
pieces" (i.e, apply the logical NOT operator to the bitboard), because the
only restriction on knights is that they can not capture their own pieces!
For a (slightly) more detailed discussion of bitboards, see
the article describing the CHESS 4.5 program developed at Northwestern
University, in Peter Frey's book Chess Skill in Man and Machine; there are at
least two editions of this book, published in 1977 and 1981.
Note: To this day, few personal computers use true 64-bit
processors, so at least some of the speed advantages associated with bitboards
are lost. Still, the technique is pervasive, and quite useful.
Transposition Tables
In chess, there are often many ways to reach the same
position. For example, it doesn't matter whether you play 1. P-K4 ... 2. P-Q4
or 1. P-Q4... 2. P-K4; the game ends up in the same state. Achieving identical
positions in different ways is called transposing.
Now, of course, if your program has just spent considerable
effort searching and evaluating the position resulting from 1. P-K4 ... 2. P-Q4,
it would be nice if it were able to remember the results and avoid repeating
this tedious work for 1. P-Q4... 2. P-K4. This is why all chess programs, since
at least Richard Greenblatt's Mac Hack VI in the late 1960's, have incorporated
a transposition table.
A transposition table is a repository of past search
results, usually implemented as a hash dictionary or similar structure to
achieve maximum speed. When a position has been searched, the results (i.e.,
evaluation, depth of the search performed from this position, best move, etc.)
are stored in the table. Then, when new positions have to be searched, we query
the table first: if suitable results already exist for a specific position, we
use them and bypass the search entirely.
There are numerous advantages to this process, including:
• Speed. In
situations where there are lots of possible transpositions (i.e., in the
endgame, when there are few pieces on the board), the table quickly fills up
with useful results and 90% or more of all positions generated will be found in
it.
• Free
depth. Suppose you need to search a given position to a certain depth; say,
four-ply (i.e., two moves for each player) ahead. If the transposition table
already contains a six-ply result for this position, not only do you avoid the
search, but you get more accurate results than you would have if you had been
forced to do the work!
• Versatility.
Every chess program has an "opening book" of some sort, i.e., a list
of well-known positions and best moves selected from the chess literature and
fed to the program to prevent it from making a fool out of itself (and its
programmer) at the very beginning of the game. Since the opening book's modus
operandi is identical to the transposition table (i.e., look up the position,
and spit out the results if there are any), why not initialize the table with
the opening book's content at the beginning of the game? This way, if the flow
of the game ever leaves the opening book and later translates back into a
position that was in it, there is a chance that the transposition table will
still contain the appropriate information and be able to use it.
The only real drawback of the transposition table mechanism
is its voracity in terms of memory. To be of any use whatsoever, the table must
contain several thousand entries; a million or more is even better. At 16 bytes
or so per entry, this can become a problem in memory-starved environments.
Other uses of transposition tables
CHESS 4.5 also employed hash tables to store the results of
then-expensive computations which rarely changed in value or alternated between
a small number of possible choices:
• Pawn
structure. Indexed only on the positions of pawns, this table requires little
storage, and since there are comparatively few possible pawn moves, it changes
so rarely that 99% of positions result in hash table hits.
• Material
balance, i.e., the relative strength of the forces on the board, which only
changes after a capture or a pawn promotion.
This may not be as useful in these days of plentiful CPU cycles,
but the lesson is a valuable one: some measure of pre-processing can save a lot
of computation at the cost of a little memory. Study your game carefully; there
may be room for improvement here.
Generating Hash Keys for Chess Boards
The transposition tables described above are usually
implemented as hash dictionaries of some sort, which brings up the following
topic: how do you generate hashing keys from chess boards, quickly and
efficiently?
The following scheme was described by Zobrist in 1970:
• Generate
12x64 N-bit random numbers (where the transposition table has 2^N entries) and
store them in a table. Each random number is associated with a given piece on a
given square (i.e., black rook on H4, etc.) An empty square is represented by a
null word.
• Start
with a null hash key.
• Scan the
board; when you encounter a piece, XOR its random number to the current hash
key. Repeat until the entire board has been examined.
An interesting side effect of the scheme is that it will be
very easy to update the hash value after a move, without re-scanning the entire
board. Remember the old XOR-graphics? The way you XOR'ed a bitmap on top of a
background to make it appear (usually in distorted colors), and XOR'ed it again
to make it go away and restore the background to its original state? This works
similarly. Say, for example, that a white rook on H1 captures a black pawn on
H4. To update the hash key, XOR the "white rook on H1" random number
once again (to "erase" its effects), the "black pawn on H4"
(to destroy it) and the "white rook on H4" (to add a contribution
from the new rook position).
Use the exact same method, with different random numbers, to
generate a second key (or "hash lock") to store in the transposition
table along with the truly useful information. This is used to detect and avoid
collisions: if, by chance, two boards hash to the exact same key and collide in
the transposition table, odds are extremely low that they will also hash to the
same lock!
History Tables
The "history heuristic" is a descendant of the
"killer move" technique. A thorough explanation belongs in the article
on search; for now, it will be sufficient to say that a history table should be
maintained to note which moves have had interesting results in the past (i.e.,
which ones have cut-off search quickly along a continuation) and should
therefore be tried again at a later time. The history table is a simple 64x64
array of integer counters; when the search algorithm decides that a certain
move has proven useful, it will ask the history table to increase its value.
The values stored in the table will then be used to sort moves and make sure
that more "historically powerful" ones will be tried first.
Pre-processing move generation
Move generation (i.e., deciding which moves are legal given
a specific position) is, with position evaluation, the most computationally
expensive part of chess programming. Therefore, a bit of pre-processing in this
area can go a long way towards speeding up the entire game.
The scheme presented by Jean Goulet in his 1984 thesis Data
Structures for Chess (McGill University) is a personal favorite. In a nutshell:
• For move
generation purposes, piece color is irrelevant except for pawns which move in
opposite directions.
• There are
64 x 5 = 320 combinations of major piece and square from which to move, 48
squares on which a black pawn can be located (they can never retreat to the
back rank, and they get promoted as soon as they reach the eight rank), and 48
where a white pawn can be located.
• Let us
define a "ray" of moves as a sequence of moves by a piece, from a
certain square, in the same direction. For example, all queen moves towards the
"north" of the board from square H3 make up a ray.
• For each
piece on each square, there are a certain number of rays along which movement
might be possible. For example, a king in the middle of the board may be able
to move in 8 different directions, while a bishop trapped in a corner only has
one ray of escape possible.
• Prior to
the game, compute a database of all rays for all pieces on all squares,
assuming an empty board (i.e., movement is limited only by the edges and not by
other pieces).
• When you
generate moves for a piece on a square, scan each of its rays until you either
reach the end of the ray or hit a piece. If it is an enemy piece, this last
move is a capture. If it is a friendly piece, this last move is impossible.
With a properly designed database, move generation is
reduced to a simple, mostly linear lookup; it requires virtually no computation
at all. And the entire thing holds within a few dozen kilobytes; mere chump
change compared to the transposition table!
All of the techniques described above (bit boards, history,
transposition table, pre-processed move database) will be illustrated in my own
chess program, to be posted when I finish writing this series. Next month, I will
examine move generation in more detail.
François Dominic Laramée, June 2000
Chess Programming Part III: Move Generation
Last month, I finished Part II of this series by introducing
a data structure which pre-processes and stores most of the work related to
chess move generation. This time, I will return to the topic in more detail,
examining the two major move generation strategies and explaining how to choose
between them for a given application.
The Dilemma
No matter how you slice it, chess is a complicated game,
especially for a computer.
In any given situation, a player may have 30 or more legal
moves to choose from, some good, some suicidal. For trained humans, it is easy
to characterize the majority of these moves as foolish or pointless: even
beginners learn that they had better come up with a solid plan before leaving
their queen in a position where she can be captured, and masters know (more
through instinctive pattern matching than by conscious effort) which 1-2 moves
are likely to be the strongest in the position.
However, coding this information (especially the unconscious
type!) into a computer has proven spectacularly difficult, and the strongest
programs (except, to some extent, Hans Berliner's Hitech and its siblings) have
given up on this approach, instead relying on "brute force": if you
can analyze all possible moves fast enough and predict their consequences far
enough down the road, it doesn't matter whether or not you start with a clear
idea of what you are trying to accomplish, because you'll discover a good move
eventually. Therefore, move generation and search should be made as fast as
possible, so as to minimize the loss of effort required by the brute force
method.
Search will be discussed in Parts IV and V of this series;
this month, we will concentrate on move generation. Historically, three major
strategies have been used in this area:
• Selective
generation: Examine the board, come up with a small number of
"likely" moves and discard everything else.
• Incremental
generation: Generate a few moves, hoping that one will prove so good or so bad
that search along the current line of play can be terminated before generating
the others.
• Complete
generation: Generate all moves, hoping that the transposition table (discussed
in Part II) will contain relevant information on one of them and that there
will be no need to search anything at all.
Selective generation (and its associated search technique,
called forward pruning) have all but disappeared since the mid 1970's. As for
the other two, they represent two sides of the same coin, trading off effort in
move generation vs search. In games where move generation is easy and/or there
are lots of ways to transpose into the same positions (i.e., Othello and
GoMoku), complete generation may be most efficient, while in games where move
generation rules are complicated, incremental generation will usually work
faster. Both strategies are sound and viable, however.
The Early Days: Forward Pruning
In 1949 (yes, really), Claude Shannon described two ways to
build a chess-playing algorithm:
• Look at
all possible moves, and all the possible moves resulting from each,
recursively.
• Only
examine the "best" moves, as determined from a detailed analysis of a
position, and then only the "best" replies to each, recursively.
At first, the second alternative seemed more likely to
succeed. After all, this is how human players do it, and it seems logical to
assume that looking at only a few moves at each node will be faster than
looking at all of them. Unfortunately, the results disproved the theory: programs
using selective search just didn't play very well. At best, they achieved low
to mid-level club player ratings, often committing humiliating blunders at the
worst possible time. Beating a world champion (or even playing reasonably well
on a consistent basis) was beyond their reach.
The problem is that, for a "best move generator"
to be any good, it has to be almost perfect. Suppose that a program is equipped
with a function that looks for the 5 best moves in a position, and that the
objective best move is among those 5 at least 95% of the time. (That, by the
way, is a big assumption.) This means that the probability that the generator's
list will always contain the best choice at all times, during a 40-move game,
is less than 13%. Even a god-like generator with 99% accuracy will blunder at
least once in about a third of its games, while a more reasonable 90%-accurate
function will play an entire game without messing up less than 1.5% of the
time!
In the mid 1970's, the legendary Northwestern team of Slate
and Atkin decided to do away with the complicated best-move generator; it
turned out that the time they saved in avoiding costly analysis during move
generation was enough to cover the added expense of a full-width search (i.e.,
examining all possible moves). To all intents and purposes, this discovery
buried forward pruning for good.
Botvinnik's work
An extreme example of a forward pruning algorithm was
developed in the Soviet Union, in the 1970's and early 1980's, under the
tutelage of former World chess champion Mikhail Botvinnik. Botvinnik was
convinced that the only way for a computer to ever play grandmaster-level chess
was to play like a grandmaster, i.e., examine only a few moves, but in great
depth and detail. His program seeked to identify and implement the sort of
high-level plans and patterns which a world-class player might come up with
during a game. While it led to some fascinating books, revealing insights into
the master's mind which only Botvinnik could provide, this work unfortunately
did not reach its lofty goals.
Generating All Moves Up-Front
Once forward pruning is eliminated from consideration, the
most straightforward way to implement full-width searching consists of:
• Finding
all of the legal moves available in a position.
• Ordering
them in some way, hopefully speeding up search by picking an advantageous
order.
• Searching
them all one at a time, until all moves have been examined or a cutoff occurs.
Early programs, for example Sargon, did this by scanning the
board one square at a time, looking for pieces of the moving side, and
computing possible move destinations on the fly. Memory being at a premium, the
expenditure of CPU time required to re-compute these moves every time was a
necessary evil.
These days, a pre-processed data structure like the one I
described last month can avoid a considerable amount of computation and code
complexity, at the cost of a few dozens of Kbytes. When this super-fast move
generation is combined with transposition tables, an added bonus may fall into
the programmer's lap: if even one of the moves has already been searched
before, and if its evaluation (as retrieved from the table) is such that it
triggers a cutoff, there will be no need to search any of the moves at all!
Obviously, the larger the transposition table, and the higher the probability
of a transposition given the rules of the game, the bigger the average payoff.
Not only is this technique conceptually simple, it is also
the most "universal": while there are easy ways to segregate chess
moves into different categories (i.e., captures and non-captures), other games
like Othello do not provide such convenient tools to work with. Therefore, if
you intend your program to play more than one game, you should probably pick
this technique instead of the one described in the next section.
One Move At A Time
Sophisticated chess programs since at least CHESS 4.5 have
adopted the opposite strategy: generate a few moves at a time, search them, and
if a cutoff can be caused, there will be no need to generate the rest of the
moves.
A combination of factors has made this technique popular:
• Search
does not require much memory. Programs of the 1970's had to make do with small
transposition tables and could ill afford pre-processed move generation data
structures, limiting the usefulness of the complete generation scheme described
above.
• Move
generation is more complicated in chess than in most other games, with
castling, en passant pawn captures, and different rules for each piece type to
contend with.
• Very
often, a refutation (i.e., a move that will cause a cutoff) is a capture. For
example, if Player A leaves a queen "en prise", the opponent will
quickly grab it and wreck A's game. Since there are usually few possible
captures in a position, and since generating captures separately is relatively
easy, computing captures is often enough to trigger a cutoff at little expense.
• Captures
are usually one of the few (if not the only) type of move examined during
quiescence search (which will be discussed in a later article), so generating
them separately is doubly useful.
Therefore, many programs will generate captures first, often
starting with those of highly valuable pieces, and look for a quick cutoff. A
number of techniques have been developed to speed up piece-meal move
generation, most of them involving bitboards:
• CHESS 4.5
maintains two sets of 64 bitboards, with one bitboard per square on the board.
One contains the squares attacked by the piece, if any, located on that square;
the other is the transpose, containing all squares occupied by pieces attacking
that square. Therefore, if the program is looking for moves that would capture
Black's queen, it looks for its position in the basic bitboard database, uses
these new data structures to identify the pieces which attack the queen's
position, and only generates moves for these pieces.
• Maintaining
these "attack bitboards" after each move requires some rather
involved technical wizardry, but a tool called a rotated bitboard can
accelerate the work significantly. The best discussion of rotated bitboards I
have seen was written by James F. Swafford, and can be found on the web at
http://sr5.xoom.com/...prg/rotated.htm
Ordering Moves To Speed Up Search
As we will see next time, search efficiency depends on the
order in which moves are searched. The gains and losses related to good or poor
move ordering are not trivial: a good ordering, defined as one which will cause
a large number of cutoffs, will result in a search tree about the square root
of the size of the tree associated with the worst possible ordering!
Unfortunately, it turns out that the best possible ordering
is simply defined by trying the best move first. And of course, if you knew
which moves are best, you wouldn't be searching in the first place. Still,
there are ways to "guess" which moves are more likely to be good than
others. For example, you might start with captures, pawn promotions (which
dramatically change material balance on the board), or checks (which often
allow few legal responses); follow with moves which caused recent cutoffs at the
same depth in the tree (so-called "killer moves"), and then look at
the rest. This is the justification for iterative deepening alphabeta, which we
will discuss in detail next month, as well as the history table we talked about
last time. Note that these techniques do not constitute forward pruning: all
moves will be examined eventually; those which appear bad are only delayed, not
eliminated from consideration.
A final note: in chess, some moves may be illegal because
they leave the King in check. However, such an occurrence is quite rare, and it
turns out that validating moves during generation would cost a tremendous
amount of effort. It is more efficient to delay the check until the move is
actually searched: for example, if capturing the King would be a valid reply to
Move X, then Move X is illegal and search should be terminated. Of course, if
search is cutoff before the move has to be examined, validation never has to
take place.
My Choice
For my chess program, I have chosen to generate all moves at
the same time. These are only some of the reasons why:
• I intend
to use the program as a basis for several other games, most of which have no
direct counterparts to chess captures.
• I have
plenty of memory to play with.
• The code
required to implement this technique is simpler to write and to understand; you
will thank me when you see it.
• There are
several freeware programs that implement piece-meal move generation; the
curious reader should look at Crafty, for example, as well as James Swafford's
Galahad.
While overall performance may be slightly less stellar than
otherwise, my program (written in Java, no less) wouldn't exactly provide a
challenge to Deep Blue even in the best case, so I won't feel too bad!
Next Month
Now, we are ready to delve into the brains of a
chess-playing program, with search techniques. This is such a large topic that
it will require two articles. We will begin with the basic search algorithms
common to all games, before continuing with new developments and chess-specific
optimizations in the next installment.
François Dominic Laramée, July 2000
Chess Programming Part IV: Basic Search
By François Dominic Laramée | Published Aug 06 2000 04:25 PM
in Artificial Intelligence
Fourth article in this complicated, code-deprived,
dry-as-Metamucil series, and you're still reading? Drop me an email if you are,
so that I know I'm writing these for a reason!
Anyway, this month's installment focuses on the basics of
two-agent search in strategy games: why it is useful, how to do it, and what it
implies for the computer's style of play. The techniques I will discuss apply
equally well to most two-player games, but by themselves, they are not quite
sufficient to play good chess; next month, I will describe advanced techniques
which significantly increase playing strength and search efficiency, usually at
the same time.
Why Search?
Well, basically, because we are not smart enough to do
without it.
A really bright program might be able to look at a board
position and determine who is ahead, by how much, and what sort of plan should
be implemented to drive the advantage to fruition. Unfortunately, there are so
many patterns to discern, so many rules and so many exceptions, that even the
cleverest programs just aren't very good at this sort of thing. What they are
good at, however, is computing fast. Therefore, instead of trying to figure out
good moves just by looking at a board, chess programs use their brute force to
do it: look at every move, then at every possible countermove by the opponent,
etc., until the processor melts down.
Deep searches are an easy way to "teach" the
machine about relatively complicated tactics. For example, consider the knight
fork, a move which places a knight on a square from which it can attack two
different pieces (say, a rook and the queen). Finding a way to represent this
type of position logically would require some effort, more so if we also had to
determine whether the knight was itself protected from capture. However, a
plain dumb 3-ply search will "learn" the value of a fork on its own:
it will eventually try to move the knight to the forking square, will test all
replies to this attack, and then capture one of the undefended pieces, changing
the board's material balance. And since a full-width search looks at
everything, it will never miss an opportunity: if there is a 5-move
combination, however obscure, that leads to checkmate or to a queen capture,
the machine will see it if its search is deep enough. Therefore, the deeper the
search, the more complicated the "plans" which the machine can
stumble upon.
Grandpa MiniMax
The basic idea underlying all two-agent search algorithms is
Minimax. It dates back from the Dark Ages; I believe Von Neumann himself first
described it over 60 years ago.
Minimax can be defined as follows:
• Assume
that there is a way to evaluate a board position so that we know whether Player
1 (whom we will call Max) is going to win, whether his opponent (Min) will, or
whether the position will lead to a draw. This evaluation takes the form of a
number: a positive number indicates that Max is leading, a negative number,
that Min is ahead, and a zero, that nobody has acquired an advantage.
• Max's job
is to make moves which will increase the board's evaluation (i.e., he will try
to maximize the evaluation).
• Min's job
is to make moves which decrease the board's evaluation (i.e., he will try to
minimize it).
• Assume
that both players play flawlessly, i.e., that they never make any mistakes and
always make the moves that improve their respective positions the most.
How does this work? Well, suppose that there is a simple
game which consists of exactly one move for each player, and that each has only
two possible choices to make in a given situation. The evaluation function is
only run on the final board positions, which result from a combination of moves
by Min and Max.
Max assumes that Min will always play perfectly. Therefore,
he knows that, if he makes move A, his opponent will reply with D, resulting in
a final evaluation of -2 (i.e., a win for Min). However, if Max plays B, he is
sure to win, because Min's best move still results in a positive final value of
5. So, by the Minimax algorithm, Max will always choose to play B, even though
he would score a bigger victory if he played A and Min made a mistake!
The trouble with Minimax, which may not be immediately
obvious from such a small example, is that there is an exponential number of
possible paths which must be examined. This means that effort grows
dramatically with:
• The
number of possible moves by each player, called the branching factor and noted
B.
• The depth
of the look-ahead, noted d, and usually described as "N-ply", where N
is an integer number and "ply" means one move by one player. For
example, the mini-game described above is searched to a depth of 2-ply, one
move per player.
In Chess, for example, a typical branching factor in the
middle game would be about 35 moves; in Othello, around 8. Since Minimax'
complexity is O( B^n ), an 8-ply search of a chess position would need to
explore about 1.5 million possible paths! That is a LOT of work. Adding a ninth
ply would make the tree balloon to about 50 million nodes, and a tenth, to an
impossible 1.8 billion!
Luckily, there are ways to cut the effort by a wide margin
without sacrificing accuracy.
Alphabeta: Making Minimax Feasible (a little)
Suppose that you have already searched Max' move B in the
mini-game above. Therefore, you know that, at best, Max' score for the entire
game will be 5.
Now, suppose that you begin searching move A, and that you
start with the path A-D. This path results in a score of -2. For Max, this is
terrible: if he plays A, he is sure to finish with, at best, -2, because Min
plays perfectly; if A-C results in a score higher than A-D's, Min will play
A-D, and if A-C should be even worse (say, -20), Min would take that path
instead. Therefore, there is no need to look at A-C, or at any other path
resulting from move A: Max must play B, because the search has already proven
that A will end up being a worse choice no matter what happens.
This is the basic idea being the alphabeta algorithm: once
you have a good move, you can quickly eliminate alternatives that lead to
disaster. And there are a lot of those! When combined with the transposition
table we discussed earlier in the series, and which saves us from examining
positions twice if they can be reached by different combinations of moves,
alphabeta turns on the Warp drive: in the best case, it will only need to
examine roughly twice the square root of the number of nodes searched by pure
Minimax, which is about 2,500 instead of 1.5 million in the example above.
Ordering Moves to Optimize Alphabeta
But how can we achieve this best case scenario? Do we even
need to?
Not really. It turns out that Alphabeta is always very
efficient at pruning the search tree, as long as it can quickly find a pretty
good move to compare others to. This means that it is important to search a
good move first; the best case happens when we always look at the best possible
moves before any others. In the worst possible case, however, the moves are
searched in increasing order of value, so that each one is always better than
anything examined before; in this situation, alphabeta can't prune anything and
the search degenerates into pure, wasteful Minimax.
Ordering the moves before search is therefore very
important. Picking moves at random just won't do; we need a "smarter"
way to do the job. Unfortunately, if there was an easy way to know what the
best move would turn out to be, there would be no need to search in the first
place! So we have to make do with a "best guess".
Several techniques have been developed to order the possible
moves in as close to an optimal sequence as possible:
• Apply the
evaluation function to the positions resulting from the moves, and sort them.
Intuitively, this makes sense, and the better the evaluation function, the more
effective this method should be. Unfortunately, it doesn't work well at all for
chess, because as we will see next month, many positions just can't be
evaluated accurately!
• Try to
find a move which results in a position already stored in the transposition
table; if its value is good enough to cause a cutoff, no search effort needs to
be expended.
• Try
certain types of moves first. For example, having your queen captured is rarely
a smart idea, so checking for captures first may reveal "bonehead"
moves rapidly.
• Extend
this idea to any move which has recently caused a cutoff at the same level in
the search tree. This "killer heuristic" is based on the fact that
many moves are inconsequential: if your queen is en prise, it doesn't matter
whether you advance your pawn at H2 by one or two squares; the opponent will
still take the queen. Therefore, if the move "bishop takes queen" has
caused a cutoff during the examination of move H2-H3, it might also cause one
during the examination of H2-H4, and should be tried first.
• Extend
the killer heuristic into a history table. If, during the course of the game's
recent development, moving a piece from G2 to E4 has proven effective, then it
is likely that doing so now would still be useful (even if the old piece was a
bishop and has been replaced by a queen), because conditions elsewhere on the
board probably haven't changed that much. The history heuristic is laughably
simple to implement, needing only a pair of 64x64 arrays of integer counters,
and yields very interesting results.
Having said all that about "reasonable ideas", it
turns out that the most effective method is one which goes against every single
bit of human intuition: iterative deepening.
Iterative Deepening AlphaBeta
If you are searching a position to depth 6, the ideal move
ordering would be the one yielded by a prior search of the same position to the
same depth. Since that is obviously impossible, how about using the results of
a shallower search, say of depth 5?
This is the idea behind iterative deepening: begin by
searching all moves arising from the position to depth 2, use the scores to
reorder the moves, search again to depth 3, reorder, etc., until you have
reached the appropriate depth.
This technique flies in the face of common sense: a
tremendous amount of effort is duplicated, possibly 8-10 times or more. Or is
it?
Consider the size of a search tree of depth d with branching
factor B. The tree has B nodes at depth 1, B*B at depth 2, B*B*B at depth 3,
etc. Therefore, searching to depth (d-1) yields a tree B times smaller than
searching to depth d! If B is large (and remember that it is about 35 during
the middle game in chess), the overwhelming majority of the effort expended
during search is devoted to the very last ply. Duplicating a search to depth
(d-1) is a trifling matter: in fact, even if it yielded no advantages
whatsoever, iterative deepening would only cost less than 4% extra effort on a
typical chess position!
However, the advantages are there, and they are enormous:
using the results of a shallower search to order the moves prior to a deeper
one produces a spectacular increase in the cutoff rate. Therefore, IDAB
actually examines far fewer nodes, on average, than a straight AB search to the
same depth using any other technique for move ordering! When a transposition
table enters the equation, the gain is even more impressive: the extra work
performed to duplicate the shallow parts of the search drops to nothing because
the results are already stored in the table and need not be computed again.
Computer Playing Style
Iterative deepening alphabeta combined with a transposition
table (and a history table to kickstart the effort) allows the computer to
search every position relatively deeply and to play a reasonable game of chess.
That being said, its Minimax ancestor imposes a very definite playing style on
the computer, one which is not exactly the most spectacular in the world.
For example, suppose that the machine searches a position to
depth 8. While looking at a certain move, it sees that every possible response
by the opponent would let it win the game in dazzling manner... Except for a
single opaque, difficult, obfuscated and almost maddeningly counter-intuitive
sequence, which (if followed to perfection) would allow the opponent to salvage
a small chance of eventually winning the game. Against a human player (even a
Grandmaster), such a trap might turn the game into one for the history books.
However, if the machine then finds a boring move which
always forces a draw, it will immediately discard the trap, because it assumes
that the opponent would find the perfect counter, no matter how improbable that
is, and that the draw is the best it can hope for!
As a result, you might say that machines play an overly
cautious game, as if every opponent was a potential world champion. Combined
with the fact that computers often can't search deep enough to detect the traps
which human players devise against them, this allows very skilled humans to
"waste time" and confuse the machine into making a blunder which they
can exploit. (Human players also study their opponent's styles for weaknesses;
if Kasparov had been given access to, say, a hundred games played by Deep Blue
before their match, he might have been able to find the flaws in its game and
beat it. But we'll never know for sure.)
Next Month
In Part V, we will discuss the limitations of straight,
fixed-depth alphabeta search, and how to improve playing strength using
techniques like the null-move heuristic, quiescence search, aspiration search
and MTD(f), and the "singular extensions" which made Deep Blue
famous. Hold on, we're almost done!
François Dominic Laramée, August 2000
Chess Programming Part V: Advanced Search
By François Dominic Laramée | Published Sep 05 2000 06:09 PM
in Artificial Intelligence
Hey, it looks like there are dozens (and dozens) of you
reading this series! I'm tickled pink!
In this next-to-last article, we will examine advanced
search-related techniques which can speed up and/or strengthen your
chess-playing program. In most cases, the concepts (if not the actual code)
also apply to a variety of other 2-player games; adapting them to your
particular problem, however, shall remain an exercise for the proverbial
reader.
Why Bother?
So far, all of the search algorithms we have looked at
examine a position's consequences to a fixed "depth". However, this
is rarely a good thing. For example, suppose that your program uses an
iterative-deepening alpha-beta algorithm with maximum depth 5-ply. Now look at
these cases:
• Along a
certain line of play, you discover a position where one of the players is
checkmated or stalemated at depth 3. Obviously, you don't want to keep
searching, because the final state of the game has been resolved. Not only
would searching to depth 5 be a colossal waste of effort, it may also allow the
machine to finagle its way into an illegal solution!
• Now,
suppose that, at depth 5, you capture a pawn. The program would be likely to
score this position in a favorable light, and your program might decide that
the continuation leading to it is a useful one. However, if you had looked one
ply further, you might have discovered that capturing the pawn left your queen
undefended. Oops.
• Finally,
suppose that your queen is trapped. No matter what you do, she will be captured
by the opponent at ply 4, except for one specific case where her death will
happen at ply 6. If you search to depth 5, the continuations where the queen is
captured at ply 4 will be examined accurately, and scored as likely disasters.
However, the unique line of play where the queen is only captured at ply 6
(outside of the search tree) doesn't reveal the capture to the machine, which
thinks that the queen is safe and gives it a much better score! Now, if all you
have to do to push the queen capture outside of the search tree is delay the
opponent with a diversion, doing so may be worth the risk: although it could
damage your position, it might also cause the opponent to make a mistake and
allow the queen to escape. But what if you can only delay the queen capture by
sacrificing a rook? To the machine, losing a rook at ply 4 is less damaging
than losing a queen, so it will merrily throw its good piece away and "hide"
the too-horrible-to-mention queen capture beyond its search horizon. (During
its next turn, of course, the machine will discover that the queen must now be
captured at ply 4 in all cases, and that it has wasted a rook for no gain.)
Hans Berliner described this "horizon effect" a long time ago, and it
is the most effective justification for the "quiescence search"
described in the next section.
The bottom line is this: a great many positions in chess
(and in other games as well) are just too chaotic to be evaluated properly. An
evaluation function can only be applied effectively to "quiet"
positions where not much of importance is likely to happen in the immediate
future. How to identify these is our next topic.
Quiet, Please!
There are two ways to assess a position's value: dynamic
evaluation (i.e., look at what it may lead to) and static evaluation (i.e., see
what it looks like on its own, irrespective of consequences). Dynamic
evaluation is performed through search; as we have just mentioned, static
evaluation is only feasible when the position is not likely to undergo an
overwhelming change of balance in the near future. Such relatively stable
positions are called "quiet" or "quiescent", and they are
identified via "quiescence search".
The basic concept of Quiescence Search is the following:
once the program has searched everything to a fixed depth (say, 6-ply), we
continue each line of play selectively, by searching "non-quiescent"
moves only, until we find a quiescent position, and only then apply the evaluator.
Finding a quiet position requires some knowledge about the
game. For example, which moves are likely to cause a drastic change in the
balance of power on the board? For chess, material balance tends to be the
overwhelming consideration in the evaluator, so anything that changes material
is fair game: captures (especially those of major pieces) and pawn promotions
certainly qualify, while checks may also be worth a look (just in case they
might lead to checkmate). In checkers, captures and promotions also seem like
reasonable choices. In Othello, every single move is a capture, and
"material balance" can change so much in so little time that it might
be argued that there are no quiet positions at all!
My own program uses a simple quiescence search which extends
all lines of play (after a full-width search to depth X) by looking exclusively
at captures. Since there are usually not that many legal captures in a given
position, the branching factor in the quiescence search tends to be small (4-6
on average, and quickly converging to 0 as pieces are eaten on both sides).
Nevertheless, the quiescence search algorithm is called on a LOT of positions,
and so it tends to swallow 50% or more of the entire processing time. Make sure
that you need such a scheme in your own game before committing to it.
Only when no capture is possible does my program apply its
evaluator. The result is a selectively-extended search tree which is anything
but fixed-depth, and which defeats most of the nasty consequences of the
"horizon effect".
The All-Important Null-Move
One of the most effective ways to speed up a chess program
is to introduce the concept of a null move into the equation.
The null move consists, quite simply, of skipping a turn and
letting the opponent play two moves in a row. In the overwhelming majority of
positions, doing nothing is a bone-head idea: you should (almost) always be
able to do *something* to improve your lot. (To be honest, there are a few
"damned if I do, damned if I don't" positions where the null move
would actually be your best bet, and the computer will not play them correctly,
but such "zugzwang" positions are hopeless anyway, so the loss of
performance is not very traumatic.)
Allowing the computer to try a null move during search has
several advantages related to speed and accuracy. For example:
• Suppose
that a position is so overwhelmingly in your favor that, even if you skipped
your turn, the opponent couldn't respond with anything that would help. (In
program terms, you would get a beta cutoff even without making a move.) Suppose
further that this position is scheduled to be searched to depth N. The null
move, in effect, takes out an entire ply of the search tree (you are searching
only the null move instead of all your legal ones) and if your branching factor
is B, searching the null move is equivalent to looking at a single depth N-1
subtree instead of B of them. With B=35 as in the typical chess middlegame,
null-move search may only consume 3% of the resources required by a full depth-N
examination. If the null move search reveals that you are still too strong even
without playing (i.e., it creates a cutoff), you have saved 97% of your effort;
if not, you must examine your own legal moves as usual, and have only wasted an
extra 3%. On average, the gain is enormous.
• Now,
suppose that, during quiescence search, you reach a position where your only
legal capture is rook-takes-pawn, which is immediately followed by the
opponent's knight-takes-rook. You'd be a lot better off not making the capture,
and playing any other non-capture move, right? You can simulate this situation
by inserting the null move into the quiescence search: if, in a given position
during quiescence search, it is revealed that the null move is better than any
capture, you can assume that continuing with captures from this position is a
bad idea, and that since the best move is a quiet one, this is a position where
the evaluation function itself should be applied!
Overall, the null-move heuristic can save between 20% and
75% of the effort required by a given search. Well worth the effort, especially
when you consider that adding it to a program is a simple matter of changing
the "side to play" flag and adding less than a dozen lines of code in
the quiescence search algorithm!
Aspirated Search and MTD(f)
Plain old alphabeta assumes nothing about a position's
ultimate minimax value. It looks at *everything*, no matter how preposterous.
However, if you have a pretty good idea of what the value will turn out to be
(for example, because you are running an iterative-deepening scheme and have
received the previous iteration's results), you might be able to identify lines
of play that are so out of whack with your expectations that they can't
possibly be right, and cut them off pre-emptively.
For example, suppose that you have reason to believe that a
position's value will be close to 0, because it is very well balanced. Now,
suppose that an internal node's preliminary evaluation is at +20,000. You can
cutoff with reasonable confidence.
This is the idea behind "aspiration search", a
variant of alphabeta in which, instead of using +INFINITY and -INFINITY as the
initial bounds of the search, you set a small window around the expected value
instead. If the actual value happens to fall within the window, you win: you'll
get it without error, and faster than you would otherwise (because of the many
extra cutoffs). If not, the algorithm will fail, but the error will be easy to
detect (because the minimax value you'll receive will be equal to one of the
bounds); you'll have to waste a bit of time re-searching with a wider window.
If the former case happens more often than the latter, you win on average.
Obviously, the better your initial guess of the expected value, the more useful
this technique is.
In the mid 1990's, researcher Aske Plaat extended aspiration
search to its logical conclusion: what if you called an aspirated alphabeta
with a search window of width equal to zero? It would fail all the time, of
course... But it would do so *very quickly*, because it would cutoff every path
almost immediately. Now, if the failure indicates that the actual value is
lower than your estimate, you can try again, with another zero-width window
around a smaller estimate, etc. In a sense, you could then use alphabeta to
perform a binary search into the space of all possible minimax values, until
you reach the only call which will *not* fail because the zero-width window
will be centered on the position's actual value!
This brilliant idea, presented in a paper available on the
web at http://theory.lcs.mi...plaat/mtdf.html, has been embodied in the MTD(f)
search algorithm, which is all of 10 lines long. Tacked on top of an alphabeta
implementation equipped with a transposition table, MTD(f) is incredibly efficient
and highly parallel-friendly. It also works better with
"coarse-grain" (and therefore probably simpler and faster)
evaluators: it is easy to see that it takes fewer probes to zero in on the
actual value in a binary search if the smallest "atom" of value is
equal to, say, 0.1 pawns rather than 0.001 pawns.
There are other alphabeta variants in wider use (namely, the
infamous NegaScout; I would rather teach General Relativity to orangutangs than
get into that mess) but Plaat insists that MTD(f) is the most efficient
algorithm in existence today and I'll take his word for it. My own program uses
MTD(f); you'll be able to marvel at the algorithm's simplicity very shortly!
Singular Extensions
One last thing before we leave the topic of search: in
chess, some moves are obviously better than others, and it may not be necessary
to waste too much time searching for alternatives.
For example, suppose that after running your iterative
algorithm to depth N-1, you discover that one of your moves is worth +9000 (i.e.,
a capture of the opponent's queen) and all others are below 0. If saving time
is a consideration, like in tournaments, you may want to bypass the whole depth
N search and only look at the best move to depth N instead: if this extra ply
does not lower its evaluation much, then you assume that the other moves won't
be able to catch up, and you stop searching early. (Remember: if there are 35
valid moves at each ply on average, you may have just saved 97% of your total
effort!)
Deep Blue's team has pushed this idea one step further and
implemented the concept of "singular extensions". If, at some point
in the search, a move seems to be a lot better than all of the alternatives, it
will be searched an extra ply just to make sure that there are no hidden traps
there. (This is a vast oversimplification of the whole process, of course, but
that's the basic idea.) Singular extensions are costly: adding an extra ply to
a node roughly doubles the number of leaves in the tree, causing a commensurate
increase in the number of calls to the evaluator; in other words, Deep Blue's
specialized hardware can afford it, my cranky Java code can't. But it's hard to
argue with the results, isn't it?
Next Month
In Part VI, we wrap up the series with a discussion of
evaluation functions, the code which actually tells your program whether a
given board position is good or bad. This is an immense topic, and people can
(and do) spend years refining their own evaluators, so we will have to content
ourselves with a rather high-level discussion of the types of features which
should be examined and their relative importance. If everything goes according
to plan, I should also have some Java code for you to sink your teeth into at
about that time, so stick around, won't you?
François Dominic Laramée, September 2000
Chess Programming Part VI: Evaluation Functions
By François Dominic Laramée | Published Oct 08 2000 04:07 PM
in Artificial Intelligence
Download attached
article resource
It's been six months, and I know sometimes it must have felt
like I would never shut up, but here we are: the sixth and last installment of
my chess programming series. Better yet: my Java chess program (see attached
resource file), primitive though it may be, is now complete, and I shipped it
to Gamedev along with this, which proves that I know (a little bit of) what
I've been talking about.
This month's topic, the evaluation function, is unique in a
very real sense: while search techniques are pretty much universal and move
generation can be deducted from a game's rules and no more, evaluation requires
a deep and thorough analysis of strategy. As you can well imagine, it is
impossible to assess a position's relative strengths if we don't know what
features are likely to lead to victory for one player or the other. Therefore,
many of the concepts discussed here may apply to other games in very different
fashion, or not at all; it is your job as programmer to know your game and
decide what the evaluator should take into consideration.
Without further delay, let us take a look at some board
evaluation metrics and at how they can be used to evaluate a chess position.
Material Balance
To put it simply, material balance is an account of which
pieces are on the board for each side. According to chess literature, a queen
may be worth 900 points, a rook 500, a bishop 325, a knight 300 and a pawn 100;
the king has infinite value. Computing material balance is therefore a simple
matter: a side's material value is equal to
MB = Sum( Np * Vp )
where Np is the number of pieces of a certain type on the
board and Vp is that piece's value. If you have more material on the board than
your opponent, you are in good shape.
Sounds simple, doesn't it? Yet, it is by far the overwhelming
factor in any chess board evaluation function. CHESS 4.5's creators estimate
that an enormous advantage in position, mobility and safety is worth less than
1.5 pawns. In fact, it is quite possible to play decent chess without
considering anything else!
Sure, there are positions where you should sacrifice a piece
(sometimes even your queen) in exchange for an advantage in momentum. These,
however, are best discovered through search: if a queen sacrifice leads to mate
in 3 moves, your search function will find the mate (provided it looks deep
enough) without requiring special code. Think of the nightmares if you were
forced to write special-case code in your evaluator to determine when a
sacrifice is worth the trouble!
Few programs use a material evaluation function as primitive
as the one I indicated earlier. Since the computation is so easy, it is
tempting to add a few more features into it, and most people do it all the
time. For example, it is a well-known fact that once you are ahead on material,
exchanging pieces of equal value is advantageous. Exchanging a single pawn is
often a good idea, because it opens up the board for your rooks, but you would
still like to keep most of your pawns on the board until the endgame to provide
defense and an opportunity for queening. Finally, you don't want your program
to panic if it plays a gambit (i.e., sacrifices a pawn) from its opening book,
and therefore you may want to build a "contempt factor" into the
material balance evaluation; this allows your program to think it's ahead even
though it is behind by 150 points of material or more, for example.
Note that, while material balance is highly valuable in
chess and in checkers, it is deceiving in Othello. Sure, you must control more
squares than the opponent at the end of the game to win, but it is often better
to limit his options by having as few pieces on the board as possible during
the middlegame. And in other games, like Go-Moku and all other Connect-N
variations, material balance is irrelevant because no pieces are ever captured.
Mobility and Board Control
One of the characteristics of checkmate is that the victim
has no legal moves left. Intuitively, it also seems better to have a lot of
options available: a player is more likely to be able to find a good line of
play if he has 30 legal moves to choose from than if he is limited to 3.
In chess, mobility is easily assessed: count the number of
legal moves for each side given the position, and you are done. However, this
statistic has proven to be of little value. Why? For one thing, many chess
moves are pointless. It may be legal to make your rook patrol the back rank one
square at a time, but it is rarely productive. For another, trying to limit the
opponent's mobility at all costs may lead the program to destroy its own
defensive position in search of "pointless checks": since there are
rarely more than 3-4 legal ways to evade check in any given position, a
mobility-oriented program would be likely to make incautious moves to put the
opponent in check, and after a while, it may discover that it has accomplished
nothing and has dispersed its forces all over the board.
Still, a few sophisticated mobility evaluation features are
always a good thing. My program penalizes "bad bishops", i.e.,
bishops whose movement is hampered by a large number of pawns on squares of the
same color, as well as knights sitting too close to the edges of the board. As
another example, rooks are more valuable on open and semi-open files, i.e.,
files where there are no pawns (or at least no friendly ones).
A close relative of mobility is board control. In chess, a
side controls a square if it has more pieces attacking it than the opponent. It
is usually safe to move to a controlled square, and hazardous to move to one
controlled by the opponent. (There are exceptions: moving your queen to a
square attacked by an enemy pawn is rarely a good idea, no matter how many ways
you can capture that pawn once it has done its damage. You may also voluntarily
put a piece in harm's way to lead a defender away from an even more valuable
spot.) In chess, control of the center is a fundamental goal of the opening.
However, control is somewhat difficult to compute: it requires maintaining a
database of all squares attacked by all pieces on the board at any given time.
Many programs do it; mine doesn't.
While mobility is less important than it seems to the chess
programmer, it is extremely important in Othello (where the side with the
fewest legal moves in the endgame is usually in deep trouble). As for board
control, it is the victory condition of Go.
Development
An age-old maxim of chess playing is that minor pieces
(bishops and knights) should be brought into the battle as quickly as possible,
that the King should castle early and that rooks and queens should stay quiet
until it is time for a decisive attack. There are many reasons for this:
knights and bishops (and pawns) can take control of the center, support the
queen's attacks, and moving them out of the way frees the back rank for the
more potent rooks. Later on in the game, a rook running amok on the seventh
rank (i.e., the base of operations for the opponent's pawns) can cause a
tremendous amount of damage.
My program uses several factors to measure development.
First, it penalizes any position in which the King's and Queen's pawns have not
moved at all. It also penalizes knights and bishops located on the back rank
where they hinder rook movement, tries to prevent the queen from moving until
all other pieces have, and gives a big bonus to positions where the king has
safely castled (and smaller bonuses to cases where it hasn't castled yet but
hasn't lost the right to do so) when the opponent has a queen on the board. As
you can see, the development factor is important in the opening but quickly
loses much of its relevance; after 10 moves or so, just about everything that
can be measured here has already happened.
Note that favoring development can be dangerous in a game
like Checkers. In fact, the first player to vacate one of the squares on his
back rank is usually in trouble; avoiding development of these important
defensive pieces is a very good idea.
Pawn Formations
Chess grandmasters often say that pawns are the soul of the
game. While this is far from obvious to the neophyte, the fact that great
players often resign over the loss of a single pawn clearly indicates that they
mean it!
Chess literature mentions several types of pawn features,
some valuable, some negative. My program looks at the following:
• Doubled
or tripled pawns. Two or more pawns of the same color on the same file are usually
bad, because they hinder each others movement.
• Pawn
rams. Two opposing pawns "butting heads" and blocking each others
forward movement constitute an annoying obstacle.
• Passed
pawns. Pawns which have advanced so far that they can no longer be attacked or
rammed by enemy pawns are very strong, because they threaten to reach the back
rank and achieve promotion.
• Isolated
pawns. A pawn which has no friendly pawns on either side is vulnerable to
attack and should seek protection.
• Eight
pawns. Having too many pawns on the board restricts mobility; opening at least
one file for rook movement is a good idea.
A final note on pawn formations: a passed pawn is extremely
dangerous if it has a rook standing behind it, because any piece that would
capture the pawn is dead meat. My program therefore scores a passed pawn as
even more valuable if there is a rook on the same file and behind the pawn.
King Safety and Tropism
We have already touched on king safety earlier: in the
opening and middle game, protecting the king is paramount, and castling is the
best way to do it.
However, in the endgame, most of the pieces on both sides
are gone, and the king suddenly becomes one of your most effective offensive
assets! Leaving him behind a wall of pawns is a waste of resources.
As for "tropism", it is a measure of how easy it
is for a piece to attack the opposing king, and is usually measured in terms of
distance. The exact rules used to compute tropism vary by piece type, but they
all amount to this: the closer you are to the opposing king, the more pressure
you put on it.
Picking the Right Weights
Now that we have identified the features we would like to
measure, how do we assign relative weights to each?
That, my friends, is a million-dollar question. People can
(and do) spend years fiddling with the linear combination of features in their
evaluation functions, sometimes giving a little more importance to mobility,
sometimes de-emphasizing safety, etc. I wish there were an absolute solution,
but there isn't. This is still very much a matter for trial and error. If your
program plays well enough, great. If not, try something else, and play it
against the old version; if it wins most of the time, your new function is an
improvement.
Three things you may want to keep in mind:
• It is
very difficult to refine an evaluation function enough to gain as much
performance as you would from an extra ply of search. When in doubt, keep the
evaluator simple and leave as much processing power as possible to alphabeta.
• Unless
you are after the World Championship, your evaluator does not need to be
all-powerful!
• If your
game plays really fast, you may want to try evolving an appropriate evaluator
using a genetic algorithm. For chess, though, this would likely require
thousands of years!
Next Month
Well, there ain't no next month. This is it.
If I wanted to drag this series even longer, I could write
about opening books, endgame libraries, specialized chess hardware and a
zillion other things. I could, I could. But I won't. Some of these topics I
reserve for the book chapter I will be writing on this very topic later this
Fall. Others I just don't know enough about to contribute anything useful. And
mostly, I'm just too lazy to bother.
Still, I hope you enjoyed reading this stuff, and that you
learned a useful thing or two or three. If you did, look me up next year at the
GDC or at E3, and praise me to whomever grants freelance design and programming
contracts in your company, will ya?
Cheers!
François Dominic Laramée, October 2000
Watch and follow along as the process of writing a chess
engine is demonstrated and explained.
There are currently two tutorial series:
Write a simple Java chess engine with GUI in under 1,000
lines of code
OR
Write an advanced bitboard-based Java chess engine using
modern techniques.
Subscribe to get email notifications on upcoming chess engine
tutorial videos. (A new video is posted every Monday)
Links:
C#
Chess programing