tic tac toe on a holiday

Jul 06, 2010

Note: In my previous posts, I called the algorithm I used Minmax, where in fact it's really Negamax. Going forward with this post, I will call it Negamax.

I worked a bit on the Tic Tac Toe stories, but mostly distracted from the nice, holiday weekend. Unfortunately, The Sight Below canceled so I did not go to his show. I guess there's always a next time.

I probably shouldn't have worked on a story that was not under the Working phase, but this is what I ended up doing. I have a bad habit of scratching my own itch and story #11, "MinMax Player takes too long to move" was the itch. I was trying to implement the alpha-beta pruning for the Negamax algorithm in hopes that it will resolve the lag present in the game. I looked at three different pseudocodes I found on the internet:

MinMax with Alpha-Beta pruning (not Negamax)

alpha-beta(player,board,alpha,beta)
    if(game over in current board position)
        return winner

    children = all legal moves for player from this board
    if(max's turn)
        for each child
            score = alpha-beta(other player,child,alpha,beta)
            if score > alpha then alpha = score (we have found a better best move)
            if alpha >= beta then return alpha (cut off)
        return alpha (this is our best move)
    else (min's turn)
        for each child
            score = alpha-beta(other player,child,alpha,beta)
            if score < beta then beta = score (opponent has found a better worse move)
            if alpha >= beta then return beta (cut off)
        return beta (this is the opponent's best move)

Negamax with Alpha-Beta pruning (with color argument)

function negamax(node, depth, α, β, color)
    if node is a terminal node or depth = 0
        return color * the heuristic value of node
    else
        foreach child of node
            α := max(α, -negamax(child, depth-1, -β, -α, -color))
            {the following if statement constitutes alpha-beta pruning}
            if α≥β
                return α
        return α

Negamax with Alpha-Beta pruning (without the color argument)

function alphabeta(node, depth, α, β)         
    (* β represents previous player best choice - doesn't want it if α would worsen it *)
    if  depth = 0 or node is a terminal node
        return the heuristic value of node
    for each child of node
        α := max(α, -alphabeta(child, depth-1, -β, -α))     
        (* use symmetry, -β becomes subsequently pruned α *)
        if β≤α
            break                             (* Beta cut-off *)
    return α

When I first spiked and tried implementing the code, I used the first pseudocode from Yosen Lin's article. Unfortunately, it was not using Negamax, so I couldn't figure out how to implement alpha-beta pruning to my Negamax code. I then looked at the second pseudocode, but I didn't quite understand the color argument. The way I implemented Negamax, I did not need color to switch between pieces (aka colors). When I looked at the last pseudocode, I caught something I overlooked in the second pseudocode - the recursive call inside the function had the alpha and beta values not only switching signs, but were swapped as well. When I fixed the code and also make it look more like the last pseudocode, it started working. And boy was it significantly faster than the original Negamax code. Here is the code:

  def get_alpha_beta_move(board, piece, depth, alpha, beta)
    if board.game_over?
      return evaluate_score(board, piece, depth)
    else
      opponent = get_opponent(piece)
      empty_squares = board.get_empty_squares
      empty_squares.each do |s|
        temp_board = Board.new(board.to_a)
        temp_board.move(s, piece)
        score = -get_alpha_beta_move(temp_board, opponent, depth + 1, -beta, -alpha)
        if score > alpha
          alpha = score
          @best_move = s if depth == 1
        end
        break if alpha >= beta
      end
      return alpha
    end
  end

If you take a look, the key line is where it breaks the loop when alpha is greater or equal to beta. This is where it prunes branches that give no value to the MinMax player. To test the performance differences between Negamax code and Negamax with alpha-beta pruning code, I tested how long it took the MinMax player to make its first move in ten separate games.

NegamaxNegamax with Alpha-Beta pruning (in seconds)
7.0140.836
6.750.53
6.3240.469
5.5760.45
5.8930.368
5.3920.312
5.4090.311
5.5080.325
5.6330.258
5.5690.3

As you can see, the pruning significantly helps the first move. The median for Negamax only code was 5.6045 while Negamax with alpha beta pruning was 0.3465. The mean for Negamax only was 5.9068 and the Negamax alpha beta pruning was 0.4159. You might also notice the times for both algorithms improved after each iteration. I could be wrong, but I think it has something to do with the JIT-compiler in JRuby. Yes, these tests were run using JRuby 1.5.1 (1.8.7-p249).

Well, I hope you learned something from this. I sure did. Negamax with alpha beta pruning is clearly the winner. There are other, modern algorithms which are even faster, but I think it might be overkill for a simple game like Tic Tac Toe.

I shot an email to Micah (customer) last night and he prioritized the backlog stories so I can work on them in this new iteration. More on that in another post...