Tic-Tac-Toe (Minimax)

Overview

We have previously seen Tic-Tac-Toe in part 1 of this class. In this assignment, we are going to revisit the game and develop an alternative strategy to play the game.

For this assignment, your task is to implement a machine player for Tic-Tac-Toe that uses a Minimax strategy to decide its next move. You will be able to use the same console-based interface and graphical user interface to play the game as you did before. Although the game is played on a \(3 \times 3\) grid, your version should be able to handle any square grid (however, the time it will take to search the tree for larger grid sizes will be prohibitively slow). We will continue to use the same grid conventions that we have used previously.

This project does not require you to write a lot of code. It does, however, bring together a lot of concepts that we have previously seen in the class. We would like you to think about how these concepts are coming together to enable you to build a relatively complex machine player with very little code. Further, you should think about the situations in which Minimax or Monte Carlo might produce better/worse machine players for games other than Tic-Tac-Toe.

Provided Code

We have provided a TTTBoard class for you to use. This class keeps track of the current state of the game board. You should familiarize yourself with the interface to the TTTBoard class in the poc_ttt_provided module. The provided module also has a switch_player(player) function that returns the other player (PLAYERX or PLAYERO). The provided module defines the constants EMPTY, PLAYERX, PLAYERO, and DRAW for you to use in your code. The provided TTTBoard class and GUI use these same constants, so you will need to use them in your code, as well.

At the bottom of the provided template, there are example calls to the GUI and console game player. You may uncomment and modify these during the development of your machine player to actually use it in the game. Note that these are the same calls we used previously for your Monte Carlo strategy, so they take an ntrials parameter. You can pass anything you want as ntrials, since you will not be using it for Minimax. In order to allow us to use the same infrastructure, we have also provided a move_wrapper function in the template that you can pass to play_game and run_gui. This wrapper simply translates between the inputs and outputs of your function and those that were expected if you were implementing a Monte Carlo player.

Machine Player Strategy

Your machine player should use a Minimax strategy to choose the next move from a given Tic-Tac-Toe position. As the objective of this assignment is to help you bring together what you have learned, we ask that you do not search for pseudo-code to implement Minimax. At this point in the class, we hope that you can use the examples in the lectures and an English language description and be able to implement Minimax.

The general idea on Minimax is to search the entire game tree alternating between minimizing and maximizing the score at each level. For this to work, you need to start at the bottom of the tree and work back towards the root. However, instead of actually building the game tree to do this, you should use recursion to search the tree in a depth-first manner. Your recursive function should call itself on each child of the current board position and then pick the move that maximizes (or minimizes, as appropriate) the score. If you do this, your recursive function will naturally explore all the way down to the bottom of the tree along each path in turn, leading to a depth first search that will implement Minimax. The following page describes the process in more detail.

As you recursively call your minimax function, you should create a copy of the board to pass to the next level. When the function returns, you no longer need that copy of the board. In this manner, you are dynamically constructing the part of the game tree that you are actively looking at, but you do not need to keep it around.

For this mini-project, you need only implement one function:

  • mm_move(board, player): This function takes a current board and which player should move next. The function should use Minimax to return a tuple which is a score for the current board and the best move for the player in the form of a (row, column) tuple. In situations in which the game is over, you should return a valid score and the move (-1, -1). As (-1, -1) is an illegal move, it should only be returned in cases where there is no move that can be made.

You should start from this code that imports the Tic-Tac-Toe class and a wrapper function to enable you to play the game. You may add extra helper functions if so desired.

Hints

You do not need to write a lot of code to implement Minimax, but it can be difficult to get everything working correctly. Here are some hints that may help you out:

  • Do not forget the base case in your recursive function. Think carefully about when you can return with an answer immediately.
  • Remember to make a copy of the board before you recursively call your function. If you do not, you will modify the current board and you will not be searching the correct tree.
  • The SCORES dictionary is useful. You should use it to score a completed board. For example, the score of a board in which X has won should be SCORES[provided.PLAYERX]. If the game is a draw, you should score the board as 0.
  • In Minimax, you need to alternate between maximizing and minimizing. Given the SCORES that we have provided you with, player X is always the maximizing player and play O is always the minimizing player. You can use an if-else statement to decide when to maximize and when to minimize. But, you can also be more clever by noticing that if you multiply the score by SCORES[player] then you can always maximize. Why? Because this has the effect of negating player O’s scores allowing you to maximize instead of minimize for player O.
  • Minimax can be slow when there are a lot of moves to explore. The way we have set up the scoring, you do not always need to search everything. If you find a move that yields a winning score (+1 for X or -1 for O), you know that you cannot do any better by continuing to search the other possible moves from the current board. So, you can just return immediately with the score and move at that point. This will significantly speed up the search.

Submission

Owltest page

"""
Provided Code for Tic-Tac-Toe
"""

# Constants
EMPTY = 1
PLAYERX = 2
PLAYERO = 3 
DRAW = 4

# Map player constants to letters for printing
STRMAP = {EMPTY: " ",
          PLAYERX: "X",
          PLAYERO: "O"}

class TTTBoard:
    """
    Class to represent a Tic-Tac-Toe board.
    """

    def __init__(self, dim, reverse = False, board = None):
        """
        Initialize the TTTBoard object with the given dimension and 
        whether or not the game should be reversed.
        """
 
        self._dim = dim
        self._reverse = reverse
        if board == None:
            # Create empty board
            self._board = [[EMPTY for dummycol in range(dim)] 
                           for dummyrow in range(dim)]
        else:
            # Copy board grid
            self._board = [[board[row][col] for col in range(dim)] 
                           for row in range(dim)]
            
    def __str__(self):
        """
        Human readable representation of the board.
        """
        rep = ""
        for row in range(self._dim):
            for col in range(self._dim):
                rep += STRMAP[self._board[row][col]]
                if col == self._dim - 1:
                    rep += "\n"
                else:
                    rep += " | "
            if row != self._dim - 1:
                rep += "-" * (4 * self._dim - 3)
                rep += "\n"
        return rep

    def get_dim(self):
        """
        Return the dimension of the board.
        """
        return self._dim
    
    def square(self, row, col):
        """
        Returns one of the three constants EMPTY, PLAYERX, or PLAYERO 
        that correspond to the contents of the board at position (row, col).
         """
        return self._board[row][col]

    def get_empty_squares(self):
        """
        Return a list of (row, col) tuples for all empty squares
        """
        empty = []
        for row in range(self._dim):
            for col in range(self._dim):
                if self._board[row][col] == EMPTY:
                    empty.append((row, col))
        return empty

    def move(self, row, col, player):
        """
        Place player on the board at position (row, col).
        player should be either the constant PLAYERX or PLAYERO.
        Does nothing if board square is not empty.
        """
        if self._board[row][col] == EMPTY:
            self._board[row][col] = player

    def check_win(self):
        """
        Returns a constant associated with the state of the game
            If PLAYERX wins, returns PLAYERX.
            If PLAYERO wins, returns PLAYERO.
            If game is drawn, returns DRAW.
            If game is in progress, returns None.
        """
        board = self._board
        dim = self._dim
        dimrng = range(dim)
        lines = []

        # rows
        lines.extend(board)

        # cols
        cols = [[board[rowidx][colidx] for rowidx in dimrng]
                for colidx in dimrng]
        lines.extend(cols)

        # diags
        diag1 = [board[idx][idx] for idx in dimrng]
        diag2 = [board[idx][dim - idx -1] 
                 for idx in dimrng]
        lines.append(diag1)
        lines.append(diag2)

        # check all lines
        for line in lines:
            if len(set(line)) == 1 and line[0] != EMPTY:
                if self._reverse:
                    return switch_player(line[0])
                else:
                    return line[0]

        # no winner, check for draw
        if len(self.get_empty_squares()) == 0:
            return DRAW

        # game is still in progress
        return None
            
    def clone(self):
        """
        Return a copy of the board.
        """
        return TTTBoard(self._dim, self._reverse, self._board)

def switch_player(player):
    """
    Convenience function to switch players.
    
    Returns other player.
    """
    if player == PLAYERX:
        return PLAYERO
    else:
        return PLAYERX

def play_game(mc_move_function, ntrials, reverse = False):
    """
    Function to play a game with two MC players.
    """
    # Setup game
    board = TTTBoard(3, reverse)
    curplayer = PLAYERX
    winner = None
    
    # Run game
    while winner == None:
        # Move
        row, col = mc_move_function(board, curplayer, ntrials)
        board.move(row, col, curplayer)

        # Update state
        winner = board.check_win()
        curplayer = switch_player(curplayer)

        # Display board
        print board
        print
        
    # Print winner
    if winner == PLAYERX:
        print "X wins!"
    elif winner == PLAYERO:
        print "O wins!"
    elif winner == DRAW:
        print "Tie!"
    else:
        print "Error: unknown winner"

"""
Mini-max Tic-Tac-Toe Player
"""

import poc_ttt_gui
import poc_ttt_provided as provided

# Set timeout, as mini-max can take a long time
import codeskulptor
codeskulptor.set_timeout(60)

# SCORING VALUES - DO NOT MODIFY
SCORES = {provided.PLAYERX: 1,
          provided.DRAW: 0,
          provided.PLAYERO: -1}

def mm_move(board, player):
    """
    Make a move on the board.
    
    Returns a tuple with two elements.  The first element is the score
    of the given board and the second element is the desired move as a
    tuple, (row, col).
    """
    # base case
    if board.check_win() is not None:
        return SCORES[board.check_win()], (-1, -1)
    # recursion
    res = (-10, (-1, -1))
    
    for move in board.get_empty_squares():
        working_board = board.clone()
        working_board.move(move[0], move[1], player)
        score, _ = mm_move(working_board, provided.switch_player(player))
        if score * SCORES[player] == 1:
            return score, move
        elif score * SCORES[player] >= res[0]:
            res = (score * SCORES[player], move)

    
    return res[0] * SCORES[player], res[1]


def move_wrapper(board, player, trials):
    """
    Wrapper to allow the use of the same infrastructure that was used
    for Monte Carlo Tic-Tac-Toe.
    """
    move = mm_move(board, player)
    assert move[1] != (-1, -1), "returned illegal move (-1, -1)"
    return move[1]

# Test game with the console or the GUI.
# Uncomment whichever you prefer.
# Both should be commented out when you submit for
# testing to save time.

# provided.play_game(move_wrapper, 1, False)        
# poc_ttt_gui.run_gui(3, provided.PLAYERO, move_wrapper, 1, False)