Week 3: Prioritized Search: Uniform Cost, Greedy Best-First, and A* Search

Chris Tralie

Important New Terms: priority queue, cumulative cost, formal heuristic, admissibility, consistency

Today we discussed a set of search algorithms that use a priority queue instead of a queue; that is, every state on the queue is associated with a cost, and the lowest cost states come off of the queue first, regardless of when they got added. The choice of costs leads to different algorithms. The first one, uniform cost search, is still an uninformed technique, which means it doesn't know anything about the problem other than the transition graph. The final two, greedy best-first search and A* search, use something called a formal heuristic which incorporates some domain specific knowledge about how far a state is from the solution.


Uniform Cost Search

The first algorithm is a lot like breadth-first search, except we keep track explicitly of the cumulative distance of all states that lead to this state. We'll store on our priority queue this cost, the state, and the previous state. The pseudocode is below. Note that we can use the previous state to trace back the shortest path from the goal to the solution


UniformCost(start, goal):
	state = start
	visited = set([])
	frontier = []  # This is a priority queue now
	# Queue holds 3-tuples with (cumulative cost, state, previous state)
	frontier.add( (0, start, null) ) 
	reachedGoal = false
	while not reachedGoal and length(frontier) > 0:
		# Take out element from front of line
		(cumucost, state, prev) = frontier.pop()
		if state is goal:
			reachedGoal = true
		else:
			visited.add(state)
			for n in state.neighbors():
				cost = distance(state, n) # Distance of taking step from state to neighbor 
				cumucostn = cumucost + Cost # Moving to the neighbor costs as much as it 
				# took to move to state, plus the cost of moving from state to neighbor
				if not n in frontier and not n in visited:
					frontier.add( (cumucostn, n, state) )
				else if n in frontier: // See if we just found a shorter path to n
					if cumucostn < frontier.cost(n):
						Update cost of n to be cumucostn on frontier

Overall, the pseudocode is fairly similar to that of breadth-first graph search, except it keeps track of a cost and uses a priority queue instead of a regular queue (other changes are bolded). Let's look at a picture of this algorithm in the middle of its run, as shown below. Suppose we just took state off of the queue, and state has 4 neighbors a, b, c, and d which are at distances d1, d2, d3, and d4 from state, respectively. Then, assuming that this is the first time we're seeing these neighbor nodes, we'll add them to the priority queue with the distance it took to get from start to state (cumucost), plus the distance to get from state to its neighbor. At this point, this is our best estimate of the distance of shortest path from start to one of these neighbors.

Then, suppose later before we get to one of the states (say a), we take a state p off of the queue which has a as a neighbor. If the distance x from the start to p plus the distance d1' from a to p happens to be smaller than the estimate cumucost+d1 that we had before, then we update the priority of a to be the smaller distance x+d1'. Because the priority queue only removes the smallest things first, we reach a node first at a priority equal to its shortest distance from the start. As a corollary, we reach the goal state with a cost equal to the shortest possible path to the goal state.

Those who are familiar with Dijkstra's algorithm for finding shortest paths in graphs may realize that uniform cost search is essentially the same thing, except we build the graph on the fly instead of starting with the full graph on the queue with all distances of infinity.

As a final note, uniform cost search is actually equivalent to breadth-first search if all of the neighbor transition costs are the same. This is because, as we recall, breadth first search visits states in increasing order of number of steps from the origin. If each step has the same cost, then the number of steps is equal to the cost times a constant, so the order is the same.

Let's look at an example of a pixel graph using 8 neighbors instead of 4. We'll make the horizontal and vertical neighbors have a distance of 1 and the diagonal neighbors have a distance of sqrt(2), as shown in the image below

Below is an animation showing how regular breadth-first graph search would do on this. Notice how the frontier moves out in a square (as opposed to moving out in a diamond for 4 neighbors)

Conversely, notice how with uniform cost search the front moves out in more of a circle, so the pixels on the frontier are closer to truly being the same distance from the start; hence the name "uniform cost"

Note, though, that if we switch back to 4 neighbors, uniform cost search reverts back to doing exactly what breadth-first graph search does with 4 neighbors: it moves out in a diamond. This is because all neighbors are at the same cost of 1.


Greedy Best-First Search

Let's look at another algorithm that uses a priority queue, but one that falls under the class of informed search algorithms; that is, we actually know something about the problem at hand beyond simply how the neighbors are connected. In particular, we'll define something called a formal heuristic, which is an estimate of the distance of shortest path between two states. We denote this as h(a, b) between states a and b. Usually, we're the most interested in how long it takes to get to the goal, so we'll abbreviate h(a, g), where g is the goal, as simply h(a). So if you only see one parameter, it's the heuristic's estimated distance to the goal.

At the very least, a heuristic should be admissible; it should never over-estimate the true cost left along the shortest path between states. One example of an admissible heuristic for our pixel graphs is the straight line "Euclidean distance" from a state at location (x, y) to the goal at location (gx, gy)

\[ d((x, y), (g_x, g_y)) = \sqrt{ (x-g_x)^2 + (y-g_y)^2 } \]

We'll prove that this particular heuristic is admissible using a stronger condition in a moment. But for now, the pseudocode for a general admissible heuristic is shown below, with the modifications from uniform cost bolded


GreedyBestFirst(start, goal, heuristic):
state = start
visited = set([])
frontier = []  # This is a priority queue now
# Queue holds 2-tuples with (heuristic, state)
frontier.add( (heuristic(start), start) ) 
reachedGoal = false
while not reachedGoal and length(frontier) > 0:
	# Take out element from front of line
	(heuristic_state, state) = frontier.pop()
	if state is goal:
		reachedGoal = true
	else:
		visited.add(state)
		for n in state.neighbors():
			if not n in frontier and not n in visited:
				frontier.add( (heuristic(n), n) )

Let's look at how this does on a wide open grid with no obstacles, using 8 neighbors

Seems like it gets there pretty quickly! But let's switch it up a bit and place some obstacles in the way, as shown below (start shown in blue and goal shown in red, with obstacles in black)

In this case, greedy best first search has to expand relatively few nodes to find a solution still, but the shortest path it finds to the solution gets tricked into taking a longer route that takes 40 steps. By contrast, uniform cost search has to expand more nodes to find the solution, but it gets the shortest path with only 28 steps. So we can't rely on greedy best-first search to give us the shortest path to the solution, only some arbitrary path. Below are the solutions shown side by side

Greedy Best-First (40 Steps)

Uniform Cost (28 Steps)

You can try these yourself below

Greedy best-first search with obstacles

Uniform cost search with obstacles


A* Search

As it turns out, we can get the best of both worlds by doing something called A Star Search, which uses a priority that is the sum of the two costs in uniform cost and greedy best-first. In other words, A* uses an estimate of the total distance from start to finish as the sum of the cost taken so far and the sum of the heuristic h, and it prioritizes nodes which seem to be on the shortest such estimated paths. Said differently, a nodes priority is: if I force you to go through that node, what's the lowest cost path think I can come up with? We use the heuristic to estimate the part of the path we don't know yet. The picture below depicts this. Two quick notes:

  • Our estimated cost of the optimal path from start to goal is cumucost + h(state)
  • The cost to move to a neighbor is d, so once we move to the neighbor n, the cumulative cost becomes cumucost + d. The heuristic at the neighbor is h(n), so that neighbor's A* priority on the queue becomes cumucost + d + h(n).

Below is some pseudocode to implement this, with changes bolded with respect to uniform cost search. Note how we prioritize by the sum of the cumulative cost and the heuristic, but we have to remember the cumulative cost separately to be able to accurately compute the cumulative cost to neighbors. So we end up actually remembering 4 things in each priority queue entry.


AStar(start, goal, heuristic):
	state = start
	visited = set([])
	frontier = []  # This is a priority queue now
	# Queue holds 4-tuples with (cumulative cost + heuristic(state), cumulative cost, state, previous state)
	# NOTE: It is important for cumulative cost to be the second element so that an optimal solution with shorter cumulative
	# cost comes out before a non-optimal solution that ties in cumulative cost + heuristic
	frontier.add( (heuristic(start), 0, start, null) ) 
	reachedGoal = false
	while not reachedGoal and length(frontier) > 0:
		# Take out element from front of line
		(est, cumucost, state, prev) = frontier.pop() # est is priority: cumulative cost + heuristic
		if state is goal:
			reachedGoal = true
		else:
			visited.add(state)
			for n in state.neighbors():
				d = distance(state, n) # Distance of taking step from state to neighbor 
				costn = cumucost + d + heuristic(n) # Moving to the neighbor costs as much as it 
				# took to move to state, plus the cost of moving from state to neighbor,
			    # plus the heuristic
				if not n in frontier and not n in visited:
					frontier.add( (costn, cumucost + d, n, state) )
				else if n in frontier: // See if we just found a shorter path to n
					if costn < frontier.cost(n):
						Update cost of n to be costn on frontier

Below is the same example as before with the obstacles. As you can see, it still searches in the direction of the solution much more quickly than uniform cost search, but it doesn't get messed up as much by the obstacles, and it correctly returns the shortest path. In fact, an admissible heuristic with the above algorithm will always guarantee that we get a shortest path to the goal (Wikipedia has a nice proof of this).


A* Search And Consistency

There's a stronger condition we can impose on our heuristic than admissibility, which is referred to as consistency. The consistency condition for a heuristic h is

\[ h(s, g) \leq h(n, g) + d(s, n) \]

Where s, g, n are any three states in general. If, specifically, g is the goal and s and n are neighboring states, we'll more commonly see it written as

\[ h(s) \leq h(n) + d(s, n) \]

where d(s, n) is the distance between s and n.

Intuitively, the heuristic should be roughly decreasing as we get closer to the goal, but it shouldn't be dropping by too much. Below we'll explore some interesting properties about consistent heuristics which make them so useful in practice

Proposition 1: Every Consistent Heuristic Is Admissible

We can show by induction that every consistent heuristic is admissible (though admissible heuristics are not necessarily consistent). Let h*(s) be the minimum cost of any path from the state s to a goal state, and let s, s2, s3, ..., sn be a sequence of states that realizes this cost, with sn being a goal state. Then, by the consistency condition, we can write

\[ h(s) \leq h(s_2) + d(s, s_2) \]

and then, plugging in the consistency condition for s3

\[ h(s) \leq h(s_3) + d(s_2, s_3) + d(s, s_2) \]

and then, we continue by induction until we get

\[ h(s) \leq d(s, s_2) + d(s_2, s_3) + ... + d(s_{n-1}, s_n) = h^*(s) \]

Proposition 2: No Need To Update The Priority Queue with Consistent Heuristics for Constant Step Size

One really nice thing about A* with consistent heuristics is that we can guarantee that, like uniform cost search (but unlike A* with admissible heuristics), the first time a state comes out of the queue, it has the lowest possible A* cost that any path to it every will. Furthermore, if the costs of all steps are the same (as they are in the 8-puzzle and rush hour), we can also guarantee that the state has the lowest cost the first time it's added to the frontier. This means we can cut out the check to see if a node is already on the frontier with a higher cost, which is great, because updating priorities on python priority queues is a pain. This will make your life much easier on homework 2. Below is the pseudocode for A* search assuming a consistent heuristic

AStarConsistent(start, goal, heuristic):
	state = start
	visited = set([])
	frontier = []  # This is a priority queue now
	# Queue holds 4-tuples with (cumulative cost + heuristic(state), cumulative cost, state, previous state)
	# NOTE: It is important for cumulative cost to be the second element so that an optimal solution with shorter cumulative
	# cost comes out before a non-optimal solution that ties in cumulative cost + heuristic
	frontier.add( (heuristic(start), 0, start, null) ) 
	reachedGoal = false
	while not reachedGoal and length(frontier) > 0:
		# Take out element from front of line
		(est, cumucost, state, came_from) = frontier.pop()
		if state is goal:
			reachedGoal = true
		else:
			for n in state.neighbors():
				if not n in visited:
					visited.add(n)
					d = distance(state, n) # Distance of taking step from state to neighbor 
					costn = cumucost + d + heuristic(n) # Moving to the neighbor costs as much as it 
					# took to move to state, plus the cost of moving from state to neighbor,
					# plus the heuristic
					frontier.add( (costn, cumucost + d, n, state) )

Proposition 3: Special Case with Unit Cost Steps

There's another tool we can use if we know neighboring states are at a distance of 1, just as they are in the rush hour problem and the 8-puzzle problem. In this case, the definition of consistent simplifies to

\[ h(s) \leq h(n) + 1 \]

Since the order of s and n in this equation is arbitrary, what this implies in general is

\[ |h(s) - h(n)| \leq 1 \]

In other words, a consistent heuristic should never jump up or down by more than 1 between two neighboring states in problems with unit cost steps between neighbors.

With this fact, it's trivial to prove that the blocking heuristic for the rush hour problem is consistent:

  • If we're not at the goal and there's a car in the way, we have a heuristic of 2. If we then move a car out of the way to make a clear path, that takes 1 step, and our heuristic goes down to 1
  • If we have a clear path to the goal but we're not at the goal, we have a heuristic of 1, and we have to take at least 1 step to get to the goal.

Proposition 4: If A Heuristic Satisfies The Triangle Inequality And Is Admissible, Then It Is Consistent

Here's a really nice tool that we can use to prove that particular heuristics are consistent, or to tweak estimates slightly so that they become consistent heuristics. By triangle inequality, we mean the following:

\[ d(a, c) \leq d(a, b) + d(b, c) \]

This is depicted in the picture below:

As that picture suggests, it's definitely true of the straight line distance in Euclidean space. It's also true of the kendall tau distance between two permutations, which we'll use in a moment for 8-puzzle.

To prove the above assertion, let's apply the definition of the triangle inequality letting a = s some state, c = g be the goal, and b = n be a neighboring state. Then, if h satisfies the triangle inequality, we have

\[ h(s, g) \leq h(s, n) + h(n, g) \]

Since our heuristic is also admissible, h(s, n) \leq d(s, n). Therefore, we have the inequality

\[ h(s, g) \leq d(s, n) + h(n, g) \]

Or, for short

\[ h(s) \leq h(n) + d(s, n) \]

Which is the definition of a consistent heuristic! Q.E.D.

Ex) Straight Line Euclidean Distance is Consistent

As an example, we can apply this theorem to show that the straight line heuristic we used on grids is consistent for both 4 neighbor and 8 neighbor grids.

Ex) 1/2 of The Kendall-Tau Distance Is A Consistent Heuristic for 8-Puzzle

Let's consider an example of two neighboring puzzle configurations, where we end up moving a tile down to get from the one on the left to the one on the right (I'll leave it open as to what the tiles actually are by making them variables). Let's then examine what happens to the permutations we get when we read the tile elements in "row-major" order (left to right, top to bottom).

a b c
d e
f g h
a b
d e c
f g h
a b c d e f g h a b d e c f g h

We notice that there are exactly 2 discordant pairs (pairs that are in different orders in each); (c, e) and (c, d). Therefore, the Kendall-Tau distance is 2. By similar reasoning, any up or down move will be at a Kendall-Tau distance of 2. And, in fact, horizontal moves will be a Kendall-Tau distance of 0, as the example below shows

a b c
d e
f g h
a b c
d e
f g h
a b c d e f g h a b c d e f g h

Since the Kendall-Tau distance is at most 2 for a single move of cost 1, we get an admissible heuristic if we take half of it.

We can also see that the Kendall-Tau distance satisfies the triangle inequality. To prove this, note that another interpretation of this distance is the minimum number of adjacent swaps we have to make to get from one permutation to another. One can count this, for instance, as the number of adjacent swaps that insertion sort makes, though there are more efficient ways to compute this. Let

  • k(a, b) be the minimum number of adjacent swaps to get from permutation a to permutation b
  • k(b, c) be the minimum number of adjacent swaps to get from permutation b to permutation c
  • Then if we follow the swaps prescribed by k(a, b), followed by the swaps prescribed by k(b, c), we have a way of turning permutation a into permutation c that uses a total of k(a, b) + k(b, c) adjacent swaps. But this is not necessarily the optimal sequence of adjacent swaps, so we can say that k(a, c) is at most k(a, b) + k(b, c).

Taking 1/2 of this preserves the triangle inequality, so we're left with admissible heuristic that satisfies the triangle inequality! As I was showing in class, this heuristic makes a huge difference in practice; on a puzzle with 19 steps in an optimal solution, breadth-first graph search has to expand 29434 states, while A* with this heuristic only has to expand 2067 states!

Below a solution returned by A* is shown, as well as the heuristic at each step. Notice that the heuristic is roughly decreasing as we get closer to the goal, and it never changes up or down by more than 1.

Step 1 (h=5)

261
7 3
584

Step 2 (h=5)

261
73
584

Step 3 (h=6)

26
731
584

Step 4 (h=6)

2 6
731
584

Step 5 (h=5)

236
7 1
584

Step 6 (h=5)

236
71
584

Step 7 (h=4)

236
714
58

Step 8 (h=4)

236
714
5 8

Step 9 (h=4)

236
714
58

Step 10 (h=3)

236
14
758

Step 11 (h=3)

236
1 4
758

Step 12 (h=3)

236
14
758

Step 13 (h=2)

23
146
758

Step 14 (h=2)

2 3
146
758

Step 15 (h=2)

23
146
758

Step 16 (h=1)

123
46
758

Step 17 (h=1)

123
4 6
758

Step 18 (h=0)

123
456
7 8

Step 19 (h=0)

123
456
78

Python Priority Queues

When you implement prioritized search algorithms in python, you can make use of the heapq library. Below is an example of how to use this. (You should try this out in jupyter until you're comfortable!)

Note that it's possible to push on tuples of information, so you can store more than just the priority on the queue. The priority will always be compared at the first element in the tuple. If there are ties, it will compare the second element, and so on. This means that every element on the tuple should be comparable by a <, which, unfortunately, means you can't put a dictionary on the tuple.

In the case of A*, you'll want to push on a 4-tuple of (heuristic + cumucost, state, prev, cumucost). For ties, state should be comparable. This is why the __lt__ method is overloaded in the most recent starter code