CSE 421 Notes

Table of Contents

These are Mark Polyakov's notes from taking CSE 421, Algorithms, at the University of Washington during Winter 2021/2.

1. Big O

Logarithms in any base are in the same complexity class as logarithms in any other base, ie \(\log_a n\in O(\log_b n)\) for all a and b.

Any power of a logarithm is of sub-linear complexity, ie \(\log^k(n)\in O(n)\). Different powers of logarithm are in different complexity classes: \(\log n\notin O(\log^2 n)\).

\(k^n\notin O((k-1)^n)\). In the same vein, \(k^{ln}\notin O(k^{(l+1)n})\). This means something important: Two expressions, whose exponents (or bases) have the same time complexity, may not as a whole have the same time complexity! The "constant" factors can be big deal up there. In fact, if you have \(2^x\) and \(2^y\) and can show that \(y/x>r>0\) as \(n\to\infty\), then \(2^y\notin O(2^x)\).

Also note that, eg \(5n=e^{\log {5n}}\). When logarithms appear in the exponent, it still may be polynomial complexity.

Common tricks for comparing complexities:

  • Cancel out polynomial factors of n.
  • Convert all into exponential, then compare exponents.

Sometimes, the runtime is in terms of the optimal solution, such as \(O(nU)\). This is called "pseudo-polynomial". We call \(O(m\log U)\) "weakly polynomial" and \(O(m)\) "strongly polynomial".

2. Graphs

Tree: Exactly one path between any two vertices.

Connected: Every vertex is accessible from every other vertex.

For this class, undirected graphs can't have self-edges.

2.1. Maximum Number of Edges

Maximum number of edges in an undirected graph with n vertices: \(\frac{n(n-1)}{2}\). (every vertex connected to every other).

Maximum number of edges in a tree:

Lemma: A tree plus an extra edge is not a tree. Now there are two paths between the nodes connected by the new edge.

Using lemma ^^: Maintain a set of vertices and a set of edges. Add any one vertex to the vertex set. Repeatedly, choose any unexplored edge touching one of the explored vertices. This will add exactly one new vertex and one new edge to the maintained sets (certainly one edge. certainly not more than one vertex. Certainly at least one vertex because there's only one path to each vertex from the original vertex, and this is the first time exploring the path containing this edge, so it must be new). Thus, the size of the vertex set is always one greater than the size of the edge set. We terminate when all vertexes have been explored. At this point there will be n-1 edges. The set of explored edges and vertices is a tree (every two vertexes are connected by a path through the original vertex, and its a subgraph of a tree), and therefore there are always exactly n-1 edges.

Another way to prove: lemma: (assuming at least two vertices) at least one vertex has only a single edge. Else, you could keep going forwards on new edges, but finite number of vertices, so you'd eventually make a loop, which implies two ways to connect any two vertices in the loop, so not a tree. Now to use the lemma: Claim that each tree has n-1 edges. Suppose true for n-1 (i.e., tree with n-1 vertices has n-2 edges). Now take the tree with n vertices. One of the vertices has exactly one edge, by the lemma. Consider the tree formed by removing that vertex and edge. It's certainly still a tree, since no vertices were connected through the removed vertex. By inductive hypothesis, smaller tree has n-1 vertex and n-2 edges. Adding back in v, the full tree has n vertex and n-1 edges.

2.2. Storage of graphs

"Adjacency Array": Two arrays, first one is how many neighbors each vertex has, second stores the indexes of the neighbors.

Adjacency list: An array of linked list heads, each list being all the nodes touching that first node.

Adjacency matrix: Alright, but takes up lots of space, even though graphs are usually sparse.

2.3. bipartitep

Bipartite: Can choose two sets of vertices, so that all edges cross between sets.

Equivalently, a bipartite graph is "2-colorable": We can color each node such that it is only connected to nodes of the other color.

If a graph has an odd-length cycle, it's certainly not bipartite.

Color some node red. Add all the edges touching that node to a queue. Iterate the edges in the queue and if one side is colored, color the other side opposite. If already colored correctly, do nothing. If both sides have same color, return false. Every time a node is colored, add all its edges to the queue if not already processed.

The way taught in class: Do a BFS to generate levels away from some root vertex.

2.4. BFS

BFS and DFS on graphs are not trivial, because the graph is not already a tree. It's not quite as simple as when you do BFS or DFS on something that's already a tree. Instead, both BFS and DFS create a tree on top of the graph. There are different properties of these trees, and special properties of the non-tree edges.

A trivial application of BFS (or DFS) is to find the connected component around a node. (in undirected, at least)

"Nontree edges" in BFS are either between nodes at most one level apart (level = distance from root.)

2.5. DFS

To find the youngest common ancestor of two nodes, we can't really use BFS. Imagine the graph 1->2, 1->3, 2->4, 2->5, 3->4, 3->5. A reasonable BFS result is 1->2, 1->3, 2->4, 3->5. The youngest common ancestor of 4 and 5 is 2 or 3, but the BFS seems to indicate 1. Instead, we should use DFS here.

The edges in the DFS tree can be weird – between levels, or across the same level (level = minimum distance to starting point). However, the "non-tree" edges only go between ancestors. The only reason an edge x,y would not be explored by DFS is that, while some other edge from x was being explored, y was explored already. But this means y is a descendent of x.

An invariant maintained during DFS: If a node is fully explored (recursion returned from that node), then all descendants of that node are fully explored. (This is not true for BFS)

2.6. Topological Sort

topological sort is possible iff the graph is a DAG. topo sort => DAG: cycle certainly prevents topo sort. DAG => topo sort: It's true that every DAG has a vertex with no incoming edges (else, you could keep traversing backwards indefinitely. Similarly, there must be one vertex with no outgoing edge, or you could traverse that way too). That vertex can go first. If we remove it, the graph is still a DAG, so we can choose a next vertex, etc, forming a topo sort.

One way to come up with a topo sort is to repeatedly find the vertex with no incoming edges. Keep a priority queue of nodes by the number of incoming edges. The lowest should have zero. Then decrement each node that had an edge with the removed node, and take the new lowest node. Time complexity is O(n+m log n); the O(n) is the simple loop through the vertices (also the time to build the heap) while m log n is to decrement each edge.

Using DFS we can do even better. An invariant maintained during DFS search is that, when we return from recursion on a given node, all descendents of that node have already been explored (though perhaps during earlier recursions on othe nodes). Because of this, during DFS we simply add each vertex to the front of a list once we return from its recursion, and we know that it's ahead of all its dependencies in the list. Every node gets added to the list, and every node precedes its descendants, so it's a topological sort in O(n+m) time!

2.7. Spanning Trees

Spanning tree: A subset of a graph which is a single connected tree and contains all the vertices.

Minimum spanning tree: A spanning tree which minimizes the sum of the cost of edges included in the tree.

Dijkstra's algorithm always finds a spanning tree, but it's not necessarily minimal. Eg, all edges have cost 1. Dijkstra from 1 may find 1->2->3 and 1->4->5. But if there's also an edge from 3-5, then the MST should be 1-2-3-5, not 3-2-1-4-5.

MSTs can be used to help optimize the cost of building a network (of roads, electricity, etc), and can approximate some problems like traveling salesman.

Cut: A set of edges such that if those edges are removed, the graph is split into exactly two nonempty connected components.

Cut property: For any valid cut, the lowest-cost (lightest) edge in the cut (means, removed by the cut) must be included in any MST. Certainly some edge of the cut must be included, otherwise the tree would not be connected. In fact, exactly one edge from the cut must be included. More than one would cause a loop (easy to see why). So we should certainly include the cheapest.

Cycle property: Most expensive edge of any given cycle is not in MST. If it were, then TODO

2.7.1. Kruskal's Algorithm

Add edges one by one, in order of cheapest edge that does not create a cycle.

This edge will always belong to the MST, because of cut property between all the vertices on one side of the edge which are already connected in the provisional mst, and all other nodes in the graph (None of the edges in this cut would cause a cycle, so they are all currently being considered by Kruskal, so it's picking the best in the cut). Further, once all such edges have been added, we have a tree. Thus, it must be an MST.

Kruskal's needs an efficient way to determine whether a new edge would create a cycle. Do this by maintaining a list of connected sets so far (ie, sets of vertices that form connected components by the edges added to the mst result so far). We need to quickly determine whether two nodes are in the same connected component (so we know when not to add an edge), and to merge two connected components (to support future checks).

With union find, described below, the runtime becomes O(m log m) (which is actually equal to O(m log n), because m<=n2 so log m<=2log n). This is because we loop over every edge in the graph from lowest cost to highest cost, which requires log m time to get the lowest edge from the heap, and then log n to check whether the two vertices are in the same union find group, and perhaps another log n to join the groups if we decide to keep the edge. An alternate heapless implementation where we just sort the list of edges beforehand has the same time complexity: m log m + m log n = m log m.

  1. Union Find Data Structure

    A simple, but slow, implementation uses a list of lists or 2d array, augmented by a hashmap saying which set each vertex is in (or nil). It's O(1) to find whether two vertices share a set, but merging components is bad.

    Maintains a number of sets. O(log n) time to check whether two elements are in the same set or merge two sets.

    Implementation is pretty simple: An array of size n, where each element points to a "parent", which is just another element of the set in the same group. So initially, if every element is in its own group, then A[i]=i. To merge a group, find the "root" of the group, and update it to point at an element in the other group.

    Does this have the desired runtime? Sorta. If we put 1 into 2, then 2 into 3, etc, we'll end up with something pretty…tall, for a worst case O(n) runtime. That being said, we can imagine how to do it better. When adding, say, 6 to 7, rather than setting 6 as a child of 7, we could set 7 as a child of 6. This is better, because 6 has two children, whereas 7 would have had just a single child.

    Keep track of the depth of the tree, and when merging, put the smaller one under the larger one. Why does this limit the depth to be logarithmic? Let a and b be the depths to be combined. The result is max(a,b) if a≠b, and a+1 if a=b. The only time the depth increases is when two nodes with equal depth are added. To get to depth 2, we must add two nodes of depth 1. To get higher than 2, we must add two nodes of depth 2, which takes 4 contributors. This will yield depth 3. Then we need 8 contributors to get to 4. So, to reach a depth of d, we need \(2^d\) contributors. Or, if we have \(n\) many elements, the depth is \(\log_2 d\), as desired.

    One document I found online suggested using the number of nodes in each tree rather than the depth. This is certainly less intuitive. Inductively, assume that a group of size s has depth at most \(\lceil \log_2 s\rceil\). This is true for s=1. For a larger group, it must have been made from two smaller groups, say a+b=s. WLOG assume a≤b. Then b≥s/2 and a≤s/2. By the inductive hypothesis, the depth of b is \(\le\lceil \log_2 b\rceil \le \lceil \log_2s\rceil\), and the depth of a is \(\le\lceil\log_2a\rceil\le\lceil\log_2s\rceil-1\). Since a is being added as a child of b, the depth of the resulting tree is no greater than \((\lceil\log_2s\rceil-1)+1=\lceil\log_2s\rceil\), as desired.

    With this optimization, union find has O(log n) lookup, and O(log n) combining (because we have to perform two lookups). Performance can be made effectively linear if, whenever we perform a lookup, we flatten the tree on the way, as implemented below:

    (defclass union-find ()
    (defmethod initialize-instance :after ((instance union-find) &key size)
      (declare ((integer 0) size))
      (with-slots (parents sizes) instance
        (setf parents (make-array (list size) :element-type 'integer :initial-contents (loop for i from 0 below size collect i)))
        (setf sizes (make-array (list size) :initial-element 1 :element-type 'integer))))
    (defun union-find--root-of (union-find x)
      "Find the root of x, and flatten the tree along the way."
      (declare (union-find union-find) ((integer 0) x))
      (with-slots (parents) union-find
        (loop for cur = x then parent
              for parent = (elt parents cur) then grandparent
              for grandparent = (elt parents parent)
              until (= cur parent)
              do (setf (elt parents cur) grandparent)
              finally (return parent))))
    (defmethod union-find-same-group-p ((union-find union-find) a b)
      (declare ((integer 0) a b))
      (= (union-find--root-of union-find a) (union-find--root-of union-find b)))
    (defmethod union-find-combine ((union-find union-find) a b)
      (declare ((integer 0) a b))
      (with-slots (parents sizes) union-find
        (let* ((a-root (union-find--root-of union-find a))
               (b-root (union-find--root-of union-find b))
               (a-root-size (elt sizes a-root))
               (b-root-size (elt sizes b-root))
               (a-smol-p (< a-root-size b-root-size))
               (smol-root (if a-smol-p a-root b-root))
               (big-root (if a-smol-p b-root a-root)))
          (setf (elt parents smol-root) big-root)
          (setf (elt sizes big-root) (+ a-root-size b-root-size)))))

2.7.2. Prim's Algorithm

Similar to Kruskal, but start at a vertex, and find cheapest outgoing edge from the tree so far. Works by cut property, very similarly to Kruskal's.

How do we go about finding the cheapest outgoing edge from the tree so far? We can filter the set of all edges to those not involving our set (O(m)) and find the minimum as we go along (no extra cost). We have to do this every time we adjust the set, for O(nm)

It's possibly to improve this a bit: Rather than looping through all edges, for every vertex not yet included in the MST, we keep track of the minimum cost to include it into the MST by adding a single edge (if possible). Every time we add a new edge to the MST, we iterate over its edges and update the minimum expansion costs appropriately.

We have to iterate through every vertex adding it to the tree. Each time, we also iterate over the edges of that vertex, and also have to look up that vertex in the priority queue to begin with. Each edge will be iterated over at most twice (because it has two endpoints), and we have to perform a pqueue update for each edge. so time complexity is O(m log n + n log n)=O(m log n). (Because m >= n-1 if connected)

2.7.3. Reverse-Delete

Start from the full graph, remove highest weight edge that doesn't break connectedness. Certainly doesn't remove any edges that are in the MST, but is the final result a tree? TODO

2.7.4. Expected linear MST

There is a randomized algorithm that achieves expected linear time MST.

3. Greedy Algorithms

An algorithm is greedy, roughly, when the decision you make at each step can be determined without thinking about the future.

A really simple greedy algorithm is the "cashier's algorithm": To give change using the least amount of coins, always select the largest coin <= amount remaining to be paid. This works for US coin denominations, but not for all possible denominations! Eg, if you have 16 cent coins and you want to make 32 coins, you'd better use two 16 cent coins rather than a 25, a 5, and two 1s.

It's not even obvious that cashier works on US coin denominations, because 10 does not divide 25. If each coin divides all larger coins, then it certainly is correct, because if you don't use the larger coin, you have to use a larger amount of smaller coins.

TODO: what condition do US coins meet that makes this so?

3.1. Interval Scheduling

Given a whole bunch of time intervals, find the largest subset of the intervals that does not overlap.

The solution is to sort the jobs in terms of ascending end time of the job, and always select the first job from the list that's compatible with the jobs selected thus far.

Suppose there's a better schedule, with n intervals, where the first selected one does not have the earliest ending time. Now replace the first interval with the one with earliest ending time. There are still n intervals, so it's at least as good. We can repeat on all the intervals to show that ours will have at least n intervals too. So our earliest ending time algorithm produces a list of intervals at least as long as any other.

Closely related is the classroom scheduling problem: Given a whole bunch of class times, what's the minimum number of classrooms we need?

You can't just use the interval scheduling to stuff as many classes in each classroom as possible – sometimes not optimal. Eg, 1->2, 4->5, 1->3, 2->6. The greedy algo will require three classrooms. But it can be done in 2.

Define depth of the schedule: maxt {number of intervals including time t}. Certainly can't use less classrooms than the depth. Thus, if we develop an algorithm that results in using exactly depth many rooms, we know it's optimal.

When we order in terms of start time, if there are classes in d many rooms simultaneously, then that means the depth at that time is d as well. Consider what happens every time we place a class. Inductive hypothesis: The current number of classrooms in use is the depth just left of point, so adding the new classroom will not increase the depth to anything more than the depth right at point. Finally, adding the new class will not cause a conflict, because there are no future classes placed yet for it to conflict with.

3.2. Minimizing lateness

You have some jobs to do (you're only one person). Each has a deadline \(d_j\), and takes a certain amount of time \(t_j\) to complete. What can you do to minimize the maximum lateness for any one job (sum doesn't matter)?

Selecting based on start time is certainly bad; the one with earliest start time might last the rest of the time!

You should always work on the task that's due soonest (choose arbitrarily if due at same time).

Take any given schedule. If there are two tasks that are not ordered in terms of first due first, then there exists an adjacent pair that's not ordered in terms of first due first (if not adjacent, the intermediate steps may make things worse). Swapping these two will not affect lateness of other tasks. Out of the two tasks, the one with later deadline cannot be the global latest, because the earlier deadline one would be even greater. The only alternative is that neither is the global latest. Either way, switching them can only decrease lateness. After switching adjacent pairs enough, the whole schedule will be ordered as described. Any schedule can be reduced to this one while only improving it, so it's optimal.

3.3. Optimal caching

Given a list of things that will be stored in cache, in a given order, and a limited cache size, how do you choose which things to evict in order to minimize number of cache misses?

Evict the one that is only needed furthest in the future. Nontrivial proof? TODO

3.4. Huffman Codes

In making a good code, we want to give frequent symbols shorter representations while ensuring that one code is not a prefix for another. A "prefix code" is any code where one code is not a prefix of another.

An ideal character code minimizes \(\sum P(c)\ell(c)\), where P(c) is the probability of this character, and l(c) is the length of the code for that character.

Huffman's algorithm: We build a binary tree from the bottom up. The left child of each node has a 0, followed by the child code. The right is similar, with 1. Choose the two least frequent characters (using a priority queue), make a little subtree out of them (assigning 0 and 1 arbitrarily), and insert that subtree into the pqueue with its probability being the sum of its components. Repeat until the whole tree is built.

3.4.1. Proof of optimality

By induction on number of characters n in the alphabet. Certainly true for n=2. Now let n be arbitrary, and suppose the huffman tree for n-1 characters is always optimal (for any choice of character frequencies). Without loss of generality, we can reduce to the case where the newly added character has the minimum probability among all n characters.


3.5. Party problem

We want to have a party where everybody knows n people, but also does not know at least n people. That way they are comfortable but make new friends too. How do we select the maximum number of friends satisfying this criteria?

It's not trivial (ie, 0 people or all of them), because if there's someone with degree 1, we cannot add them, but still may be able to throw a party.

Start with the set of everybody. Until S stops changing, filter S down to those friends with degree at least 4 (in S only), then filter again to degree less than n-4 (still in S only). Once the loop is done, S is the set of friends to invite.

First, prove that S always contains the final set: Suppose S is a superset of the final set we're interested in, and then we perform one step of the loop. Any person with degree<4 certainly won't have degree>=4 in the final set, so they are safe to remove. Similarly, if there are less than four people the person doesn't know in the current S, then there are certainly less than four people who they don't know in the final result too.

Next, prove that the loop terminates, and the desired conditions are met at termination. At each loop cycle, S gets smaller or stays the same. It can only get smaller a finite number of times, and as soon as it stops getting smaller, the loop terminates, so it does always terminate. Finally, when it terminates, all nodes in S fit the criteria for inclusion in the final result. So final \(S\subseteq result\).

This is a different approach to a greedy problem. Most greedy problems add things one by one using a simple rule. To show correctness, you prove that each time you add something to the result, it remains correct, and that when the algorithm terminates, everything appropriate has been processed. In this party problem, we remove things one by one using a simple rule. At each step, we prove that nothing important was removed. Finally, we prove that the algorithm terminates at the correct state.

4. Associative Range Queries

Ie, to find the {max,min,sum,etc} of an arbitrary range in an array.

Naïve takes O(n) with O(1) preprocessing and O(1) update: explicit calculation on every element in the range.

Square root tree takes \(O(\sqrt n)\) with O(n) preprocessing and O(1) update: Split the array into \(\sqrt n\) many blocks of length \(\sqrt n\). Precalculate the operation on each block. Then to query, you calculate over however many full blocks are in the range (at most \(\sqrt n\), since that's how many blocks there are total), then over all the individual elements at the ends (at most \(2\sqrt n\)).

Segment tree takes \(O(\log n)\) with O(n) preprocessing and \(O(\log n)\) update: Form a tree, where the root node represents the entire array, the second level nodes each are for half the array, third level nodes for a quarter, etc. When querying, you'll have to look at at most 2 nodes on any level, for \(O(\log n)\) query. To update, you simplfy adjust all the segments including the modified element, of which there will be \(\log n\). Finally, we must show that creating the tree is truly O(n). First we create all the bottom level nodes, which takes time n. The next level takes time n/2. Next n/4. Etc, so it takes time about 2n to build the whole tree, for O(n).

A Fenwick, or BIT Binary Indexed Tree, is an easy to implement solution to range queries when the operation is not only associative but invertible. It is not possible to query ranges directly, but you can query the sum from 0 up to any arbitrary number, then subtract two results to get a range result.

The tree is an array the same length as the input data array. Represent the i-th index in binary, abcdef. This index in the Fenwick array stores the sum of all indices of the original array where, in binary, the last 1 of i is turned to a zero, and all zeroes after that last 1 take all possible values apart from all zero. For example, index 1100 in the Fenwick tree stores the sum of 1001, 1010, 1011, and 1100 from the original array. To find the sum up to an element, you keep removing 1s from the end of its binary representation and summing those elements in the Fenwick tree. Each time you do this, you get the sum "from the last 1". A cool trick to get the last 1 in the binary representation of a number is x&(-x) using two's complement.

How is a Fenwick tree any better than an array whose i-th entry is the sum of all entries before i in the original? The array has O(1) query vs O(log n) for Fenwick. However, the Fenwick supports O(log n) updates (just update all parents that you'd come across in the query), compared to O(n) updates for array.

How does Fenwick differ from a segment tree? The Fenwick tree doesn't have a true leaf node for every element. Rather, we can think of it as a tree shaped like so:

To find the sum up to *, we have to take the value at *, add it to the first t (sum of two lines), then add to the second t (sum of four lines below them), and you can imagine further ones.

Because we don't have to store a leaf node for every element, we save half the memory and also get (by a constant factor) faster operations. This is at the cost of many (half) of the elements not storing their own value, but rather some cumulative value, so we need to take differences to get any useful values.

(Before this, I tried to make a Fenwick tree where node 1100 stores the sum of 1100, 1101, 1110, and 1111, rather than the conventional 1001, 1010, 1011, and 1100. This can still work: You get cumulative sums from the end of the array, and to do a lookup for 0110 for example, you have to repeatedly add the lowest 1 instead of remove it: 0110, then 0110+0010=1000)


5. Pathfinding

5.1. Dijkstra's algorithm

Generalization of BFS to arbitrary nonnegative edge weights. Maintain a queue of adjacent, unexplored nodes (just like BFS), but use a priority queue on shortest known path to each node instead of FIFO to determine which node to explore next.

Also include a set of explored nodes; no need to explore them again, the first time we see them, we have the shortest path.

Works on directed and undirected graphs.

Negative weights can break Dijkstra's. That's because, with negative weights, the cheapest path may begin with a very expensive prefix, which Dijkstra's will not explore first.

Dijkstra's counts as a greedy algorithm.

Time complexity is \(O(n\log n+m)\), because for each node we have remove and add things to a priority queue (TODO: why exactly this?)

Slight modification: How would you find the shortest, cheapest path? First thing that comes to mind is to keep track of lengths, and break ties in the priority queue by length. If all the edge costs are strictly positive integers, then we can simply add 1/n to the cost of each edge (where n is the number of nodes). This doesn't cause a problem because any shortest path has at most n-1 edges in it, so the extra contribution due to the 1/n will sum up to less than 1.

Bidirectional Dijkstra: Do dijkstra's from start and endpoints, stop when they meet. (TODO why?)

6. Divide and Conquer

6.1. Solving recurrences

6.1.1. Substitution

Eg, for merge sort, we have \(T(n)=2T(n/2)+n\). A way to get a feel for it is to recursively expand it and watch for patterns. Once you have a hypothesis, try substituting it in for T and see if things work out. For instance, merge sort happens to be n log n, so let's see what happens:

\[n\log n=2(n/2)\log(n/2)+n\]

Letting the logarithm base be 2…

$$nlog(n/2)+n=nlog(n/2)+log(2n)=nlog(n/2)+nlog 2=nlog(2⋅ n/2)=nlog As desired. How could we have done it without knowing the base of the logarithm? After all, merge sort is in O(n log n) in any logarithm base. To do this, we would have needed to introduce a free variable constant, so the solution is actually \(kn\log n\) instead, and we would need to see if any k satisfy the equation.

6.1.2. Intuition

One way to think about it: "How does T behave when its argument is multiplied?"

For example, if you have \(T(n)=T(n/2)+O(1)\), you are looking for a function that, when we multiply its argument by two, yields itself plus a constant. Logarithm is such a function (and by adjusting the base, we can make the outputted constant whatever we like)

Another example: \(T(n)=2T(n/3)+O(1)\). Something which, when its argument is tripled, becomes doubled and emits a constant. Looking at just the doubling part, we have \(n^\log_3 2\), because \(3^{\log_3 2}=2\). As for emitting a constant, we should include a logarithm somehow. We don't want to multiply by a logarithm, because then we'll end up with \(n^{\log_3 2}\log n\), and upon tripling the argument, we end up with \(2n^{\log_3 2}\log n+n^{\log_3 2}\log 3\). So we didn't add a constant: We added a copy of \(n^{\log_3 2}\)! The right way is to add a logarithm: \(n^{\log_3 2}+\log n\). But the logarithm clearly doesn't contribute to the time complexity here, so we can drop it.

On the other hand, consider \(T(n)=2T(n/3)+O(n)\). In this case, we want to emit something linear when we multiply by three. In the simpler case \(T(n)=2T(n/2)+O(n)\), we just use n log n. The answer is \(n^{\log_3 2}+n\log n\), because plugging in 3n: (TODO: that's not exactly right.). Since n log n dominates, the whole thing is O(n log n). TODO: it's actually O(n) according to wolfram alpha. 2log3n-1

Now, what about \(T(n)=4T(n/3)+O(n)\) ? Without considering recombination cost, we have \(n^{\log_3 4}\). While \(\log_3 2<1\), we have \(\log_3 4>1\), which makes things interesting. Like last time, we do something along the lines of \(n^{\log_3 4}+n\log n\), but this time it's dominated by \(n^{\log_3 4}\).

Finally, what about \(T(n)=T(n/2)+O(n)\) ? We need something that stays the same, then emits a linear term, when multiplied by two. In fact, just n works here: \(T(2n)=2n=T(n)+n\). The recombination term here dominates anything actually related to the recurrence, so it's almost trivially just O(n).

6.1.3. Master Theorem

We are analyzing \(T(n)=aT(n/b)+f(n)\)

The sum of the cost of all the items (there's only one) on the first level is f(n). The sum of the cost of all the items on the second level is \(af(n/b)\), and the third level is \(a^2f(n/b^2)\), and so on. There are \(\log_bn\) many such levels (assuming that we stop recursion when there's only a single item, but the exact depth doesn't affect the analysis much). So the total cost is

\[\sum_0^{\log_bn} a^if(n/b^i)\]

Often, \(f(x)=cx^k\) for some constant. In this case, it's easier to analyze. The sum is now \(\sum_0^{\log_bn} a^in^k/b^{ik}\), where k is constant.

  • If \(a=b^k\), then each term of the sum is a constant multiple of \(n^k\). So the whole complexity is \(n^k\log_bn\in O(n^k\log n)\).
  • If \(a>b^k\), we can use the expression for the partial sum of a geometric series, namely: \(p\frac{r^{n+1}-1}{r-1}\). Applying it: \[\sum_0^{\log_bn}a^in^k/b^{ik}= n^k\sum_0^{\log_bn}(a/b^k)^i=n^k\frac{(a/b^k)^{1+\log_bn}-1}{(a/b^k)-1}\in O(n^k(a/b^k)^{\log_bn})\]. We have \((b^k)^{\log_bn}=n^k\) and \[a^{\log_bn}=(n^{\log_na})^{\log_bn}=n^{(\log_na)(\log_bn)}=n^{(\log a)(\log n)/(\log n)/(\log b)}=n^{\log_ba}\]. So, this is equal to \(O(n^{\log_ba})\).

    Interesting logarithmic idenity just shown: \(\log_ka\log_bk=\log_ba\). That is, the base of a logarithm, and the parameter of a logarithm, cancel out, while the remaining base and parameter join together.

  • If \(a

\[\sum_0^{\log_bn}a^in^k/b^{ik}\le\sum_0^{\infty}a_in^k/b^{ik}=n^k+n^k\sum_1^{\infty}(a/b^k)^i\in O(n^k)\] Because the infinite sum is a geometric series with rate less than one, so converges to a constant.

Versions of the master theorem work without assuming the recombination cost is a simple polynomial, see https://www.cs.cornell.edu/courses/cs3110/2012sp/lectures/lec20-master/mm-proof.pdf.

The other method, where we just try to find functions we can nicely substitute into the recurrence relation, isn't perfect because we don't prove that it's the largest such function, and sometimes it's hard to come up with a concrete function in terms of elementary functions which fits the recurrence relation exactly. The method described in this section puts an upper bound on the time complexity, which solves both problems.

6.2. Aside: Merge sort vs Quicksort

Memory: Merge sort naïvely requires you to create a new array to store each merge result. It's possible, but not simple, to create an in-place merge sort.

Locality: During merge, we have to look between the different merging arrays very frequently. With quicksort,

If we do quicksort with a randomly selected pivot, it has pretty much the same recurrence and by extension time complexity as merge sort.

6.3. Basic algorithms

Exponentiation for large exponents (esp. useful modular exponentiation).

Finding any one root of a function on an interval [a,b], given that f(a)<0 and 0<f(b). In 2d, keep bisecting the space, then checking endpoints to see if the IVT guarantees a root in that section. Always at least one subsection will have a guaranteed root by IVT, because of IVT on the root and induction. It is log(n) to find an interval of width \((b-a)2^{-n}\)

6.4. Largest consecutive subsequence

Given an array of positive real numbers, what consecutive subarray has the maximum product? Naïve is \(n^2\). But we can split in half, and for each half, calculate:

  • Largest subarray overall.
  • Largest subarray touching the left edge.
  • Largest subarray touching the right edge.
  • Product of the entire thing.

Combining the results is not too hard:

  • Largest overall: Max(overall from either half, right edge of left*left edge of right)
  • Largest touching left: Max(left of left, all of left*left of right)
  • Largest touching right: Max(right of right, all of right*right of left)
  • Entire thing: entire left*entire right

The base case is trivial.

(defun maxseq-divide-and-conquer (seq &optional (operation #'*))
  (labels ((op (&rest args)
             (apply operation args))
           (%%% (seq)
             (if (= 1 (length seq))
                 (let ((val (elt seq 0)))
                   (values val val val val)) ;; May not be totally right for all operations?
                 (let* ((left-length (floor (length seq) 2))
                        (right-length (- (length seq) left-length))
                        (left-array (make-array (list left-length) :displaced-to seq))
                        (right-array (make-array (list right-length) :displaced-to seq :displaced-index-offset left-length)))
                   (multiple-value-bind (b1 l1 r1 w1) (%%% left-array)
                     (multiple-value-bind (b2 l2 r2 w2) (%%% right-array)
                       (values (max b1 b2 (op r1 l2))
                               (max l1 (op w1 l2))
                               (max r2 (op r1 w2))
                               (op w1 w2))))))))
    (%%% seq)))

This is T(n)=2T(n/2)+1, which is O(n). An algorithm that needs to keep track of fewer things, at the cost of taking O(n log n) time, is to only record the maximal subsequence for each subproblem, and then manually calculate the largest one that goes between sections, which has T(n)=2T(n/2)+n.

There's something simpler we can do, though, suggested by one of the practice midterms. For each index in the array, we calculate the largest consecutive subarray which ends with that index. This is easy to do recursively, using the greatest product at the previous index. If the product at the previous index is greater than 1, then the greatest product at this index is the last one, times this one. On the other hand, if the product at the last index is <1, then the best at this index is just the current index.

(defun maxseq-linear (seq &optional (operation #'*))
  (loop with max = nil
        with result = nil
        with last = nil
        with lastseq = nil
        for item in seq
        do (progn
             ;; for only multiplication, we could simply check (> last 1)
             (if (and last (> (funcall operation last item) item))
                 (setq last (funcall operation last item)
                       lastseq (cons item lastseq))
                 (setq last item
                       lastseq (list item)))
             (when (or (null max) (> last max))
               (setq max last
                     result lastseq)))
        finally (return (nreverse result))))

6.5. Counting inversions

Here's another algorithm: Given two lists of movie preferences, how many pairs of movies are there where user 1 prefers one movie, but user 2 prefers another? Ie, how many "inversions" are there between two permuted lists? Like often, we split the array into two blocks, calculate the number of inversions within each block, then somehow calculate the inversions between blocks. The base case is easy: A single element has no inversions.

Given two blocks, how do we find the inversions that cross between them? Note that we are doing divide and conquer on two lists simultaneously – user 1's preferences, and user 2's. So, when we split into 2 blocks, there are actually four blocks: \(a_l,\ a_r,\ b_l,\ b_r\). The idea is to find \(|a_l\cap b_r|\) and \(|a_r\cap b_l|\), then multiply the two together. This is basically the definition of a cross-section inversion. So how do we find the cardinalities of these sets? naïvely, it's n2 and we're no better off than before. But now suppose we write our algorithm as an augmented merge sort, where just before merging (usually we'd merge al and ar, then separately merge bl and br), we do a "dry run" merge between al and br, then separately between ar and bl, which tells us how many elements are in common between the two!

The unfortunate extra requirement of this algorithm is that there is some (arbitrary) total ordering of the movies.

6.6. Finding the closest pair of points

Not about finding the closest point to a given point, but rather finding the two points whose distance is minimal throughout the whole data set.

Repeatedly split the world in half, then recursively find the minimum distance sigma in that half. Splitting in half requires you to sort the points by one coordinate, n log n. Then, we need only check along a strip of width 2*sigma along the split axis.

The base case is an n2 search, which is cheap if the square is small enough.

How do we efficiently check along the strip (recombination step)? Number of ways. One is to sort along the axis, then only check those within sigma on either side.

6.7. Median

6.7.1. Quickselect

We can divide and conquer finding the k-th smallest element in a list: Choose an element, partition into "greater" and "lesser" sets in O(n) time, and then choose either the greater or lesser set to continue exploring. If we choose the pivot such that the partitions are about half the size of the original set, the recurrence is \(T(n)=T(n/2)+O(n)\in O(n)\). The big problem is choosing the partition correctly. If we happen to choose it to be one third, then we have \(T(n)=T(2n/3)+O(n)\), still O(n). Due to this, randomly selecting the pivot at each step gives expected linear time, but the strict worst case is \(T(n)=T(n-1)+O(n)\) if we only eliminate one element at each step, yielding an overall runtime of n2. This randomized algorithm is called QuickSelect.

6.7.2. Median of Medians

It's critical to understand quickselect before MoM. If you believe that a random pivot gives expected linear time, then it's not hard to believe in a linear deterministic algorithm: You just need to be able to quickly find a reasonable pivot.

Split the array into consecutive segments of fixed length, usually 5 but can be more. Sort each segment. This takes linear time, because the number of segments is linear, and sorting a fixed length array is constant time. Then, we find the median of the medians of these segments (that is, the median of the third element of each sorted segment). The time complexity of this is recursive, so we'll consider it later. We have a guarantee that no less than 3/10th of the total array is less than or greater than this MoM. Equivalently, no more than 7/10th of the total array is greater than or less than this MoM. We have the (worst case) recurrence relation: \(T(n)=T(1/5\cdot n)+T(7/10\cdot n)+O(n)\) (the last O(n) is because we still have to perform the partitioning after selecting the pivot). How can we solve this? It turns out that it's enough that 1/5+7/10<1, but T isn't obviously linear, so this isn't a proof. It turns out that because the combination cost is linear that we can in fact can linearly combine the recursive Ts. We know that the total cost is no greater than

\[n\sum_{i=0}^{\infty} \sum_{j=0}^i \binom{i}{j}\left(\frac{7}{10}\right)^{i-j}\left(\frac{1}{5}\right)^j\]

(The inner sum is number of paths down the recursion tree to depth i, where j of them were 1/5 and others were 7/10). The inner product can be reduced using the good ol' binomial theorem, \((a+b)^k=\sum_{i=0}^k\binom{k}{i}a^ib^{k-i}\). So our sum turns into \(n\sum_{i=0}^\infty (7/10 + 1/5)^i=n\sum_{i=0}^\infty (9/10)^i=10n\). So the entire cost is indeed O(n).

Could we come to this final expression directly without using the binomial theorem? At the first level, our choices are 7/10 and 1/5, for a total first-level cost of (7/10+1/5). At the second level, we have 7/10(7/10+1/5)+1/5(7/10+1/5)=(7/10+1/5)(7/10+1/5).

6.7.3. Introselect

Quickselect is faster than MoM in almost all cases, but still, programmers would like a worst-case guarantee. Introselect switches from Quickselect to MoM based on a heuristic. The ones mentioned on wikipedia are:

  • Keep track of the size of the array section being considered during recursion. If it does not half after k many pivots, then switch to MoM.
  • Keep track of the sum of the partition sizes considered so far, and switch to MoM if it exceeds k*n.

6.8. Integer Multiplication

We can split the multiplicands into two sections with equal numbers of bits (approximately). Eg, \(x=2^{n/2}x_1+x_0\), and same for y. Then


This requires four multiplications and O(n) recombination, which turns out to be O(n2), for no improvement over the naïve multiplication strategy. But there's a trick:


So if we calculate \(x_1y_1\), \(x_0y_0\), and \((x_1+x_0)(y_1+y_0)\), we can find \(xy\) by just doing bitshifts and additions! This reduces the recursion to \(T(n)=3T(n/2)+O(n)\), which turns into \(O(n^{\log_2 3})\approx O(n^{1.585})\)

6.9. Matrix multiplication

Strassen developed slightly faster ways to multiply 2x2 and 3x3 matrices. To apply to larger matrices, use divide and conquer with block matrix multiplication formulas.

6.10. TODO Multipole Algorithm

Allows performing n-body gravity simulations or equivalent in linear time to the number of bodies! Another way to word it: For vectors \(v_1,\dots, v_n\), quickly compute \(Vx\) where \(V_{ij}=\|v_i-v_j\|_2^{-1}\).

7. Dynamic Programming

Solve the "n-1" problem then use that in the next result. Different than divide and conquer because you don't even attempt to split the problem into comparably sized subcomponents – only remove a constant number of elements at a time.

Another difference from divide and conquer is that we memoize results and may access them many times. DP is sorta like a graph search but with many nodes shared between paths, so it'd be a shame to recalculate all of them.

7.1. Knapsack

Each item has weight and value. Our pack has a maximum weight. How can we choose items to maximize the value without exceeding max weight?

It's 2n to consider all possible combinations of items. Instead, let's use DP. Enumerate the items. Our subproblems return the best subset of items after some point i in the enumeration that weighless than a given weight (which may not be the same weight as in the original problem).

Assuming weights take nonnegative integer values up to a maximum W, the time complexity is clearly bounded above by O(Wn), because that's the number of possible unique subproblems. However, W can be exponential in n; imagine that the weight of the i-th item is 2i. Then there are exponentially many different combinations of weights, and no good way to reduce them.

The knapsack problem is related to how to pay for something with a minimal number of coins. Because of the way they are, US denominations allow coin choosing to be a greedy problem. But it is not so for arbitrary weights. Sometimes it's better to pick two medium-value, medium-weight items rather than one high-value, high-weight item.

7.2. RNA

RNA has four letters, AUCG. It's not just a linear sequence, though – there are loops. We therefore define the secondary structure of RNA as a list of pairs (bi,bj) to indicate an extra connection (there is always an implicit connection bi to \(b_{i+1}\)). Rules:

  • At least three bases in between two paired bases. So, (bj,bj) implies j-i≥4.
  • No "crossing"s. So (bi,bj) and (bk,bn) implies i<k<n<j or k<i<j<n, but not i<k<j<n or k<i<n<j.
  • A-U and G-C pairs are allowed, nothing else.

Goal is to maximize number of these "base pairs" without breaking the rules. This minimizes the "free energy" in the molecule.


7.3. Subsets with equal sum

Given a list, can it be decomposed into n subsets, each with an equal sum?

The extremely naïve way is to check all permutations, in n! time.

We can do it in n3 with DP. Our recursive subproblem is, given a list, can it be decomposed into n subsets with given sums x, y, and z? If the list has one element, easy base case. Else, remove the first element, and spawn three subproblems, with that first element subtracted from each of x, y, and z separately. Return true if any subproblem does.

7.4. Levenshtein Distance

We can add gaps to a word and also just replace characters. How can we go from one string to another minimizing the sum of gaps and replacements? This is edit/Levenshtein distance.

Calculate and store Levenshtein distance for the i-th and j-th suffixes of each string. This given mn time and space. If we go from shortest suffixes to longest, we don't need to store the entire n*m memoize matrix, only those for suffixes of length k-1 where k is the length currently being considered, reducing to nm runtime and n+m space.

This is sufficient for calculating the Levenshtein distance, but what if we also want to recover the instructions to perform the "alignment"? You can store the partial instruction in each memoize location, but that's going to take a lot of memory – near the end of execution, the amount stored in each memoize cell will be n+m, and you're storing n+m many of them, for … O((n+m)2), no good.

Something better is to keep track of TODO

7.4.1. In terms of a graph

Each (x,y) has edge to (x,y+1) and (x+1,y) with weight one, then (x+1,y+1) with weight zero or one depending on whether the two characters are the same. It sometimes make sense to go straight up or down (skip) in order to find a "gully" or "rut" where you can go diagonally for a long way with weight zero.

7.5. Longest path in a DAG

Do DP using "longest path starting at node i".

7.6. Longest increasing (perhaps non-contiguous) subsequence

Do DP on the longest increasing subsequence ending at each array item. Worst case n2 if the whole sequence is increasing.

TODO: how to do in n log n with segtree?

7.7. Shortest path with negative edges

Goal: Shortest path from fixed node s to all other nodes.

We disallow "negative cycles" because by going in a loop you can continually improve your path. This also means we disallow undirected edges, because that's a directed cycle.

Cannot use Dijkstra's because of negative weights.

For every node, the best path to s is the best path to s from each neighbor plus the weight of the edge to said neighbor. (True because shortest path must go through some neighbor). But how do you implement? Say we take some node, then just try to calculate it, DP'ing through the neighbors. An obvious problem is that loops can form. If A explores B and B explores A, neither knows whether the best path goes through the other. There might be a way to work around this.

Another way is to consider path lengths in increasing order. Ie, first find the minimum cost to reach each node from s in at most 0 steps (infinity everywhere), then in 1 step, then in 2 steps, etc. The maximum number of steps to reach any node is |V|-1. At each step, we look at all edges in the graph, and if one side is less than the other side + cost, we update it. So O(|V||E|) time.

The algorithm can be enhanced to check for negative cycles by running another |V|'th cycle and seeing if that causes any weights to decrease even further, which should be impossible.

7.8. Dynamic Decision Problem (Bandits!)

The "original" DP problem was actually the "dynamic decision problem", which is really similar to the multi-armed bandit problem (we will analyze it without any probabilities). Formally, we have a graph with edge weights, and a "discount factor". What edges should we travel on to maximize our reward (off to infinity, guaranteed to be finite if discount < 1)

Similar to "Markov Decision Process"

Behavior is very dependent on the discount factor. With a small discount factor (close to zero), we prefer short-term payoffs.

DP solution: OPT can be defined using a sum across all the outgoing edges. The definition is pretty simple. It's problematic because we want to go out to infinity and have to handle loops, so there's not really any "base" of the DP recursion.

So we do something that's not exactly true DP. Rather, we do it just like the single source shortest path with negative weights: Iteratively update DP result at each node using results from the last step until convergence. In SSSP, we have some guarantee of convergence after a known number of steps (|V|-1).

At the t-th step, each node stores the highest reward that can be reached in t steps ending at that node.

Estimating: If \(\varepsilon_t=\max_u trueResult(u)-calculatedResult_t(u)\), then \(\varepsilon_{t+1}\le\gamma\varepsilon_t\). TODO: show

Time complexity: O(mT) where m is number of edges and T is the number of iterations. If we want result to within delta, then we need \(T=\log_\gamma (\varepsilon_0/\delta))\).

7.9. Traveling Salesman

Brute force is n! over all orderings of cities. (assume you can take a straight path between any two cities, i.e., graph is complete)

With DP, subproblem is best path going through all of S and ending at vertex v. At each level (including the top level, which has an "undefined" ending vertex), loop through each vertex, setting it as the ending vertex for a subproblem with \(S\setminus\{v\}\). TODO: how exactly?

8. Max Flow

8.1. Problem Description

Given a directed graph, an s source vertex, a t target vertex, and a numeric "maximum capacity" of each edge, assign a "flow" to each edge such that the total flow from s to t is maximized. Flow must satisfy:

  • Conservation: Flow in to a vertex is equal to flow out, except for s and t.
  • Within capacity: Flow on an edge is nonnegative and less than or equal to the edge's capacity.

Basic application example: Given a rail network, with max capacity on each line between cities, what's the maximum achievable thoroughput between two given cities? And how many trains should you send on which routes to achieve that maximum thoroughput?

8.2. Ford Fulkerson


Ford Fulkerson has pseudo polynomial time; at every iteration, the solution strictly increases, and cannot exceed the optimal solution. Further, each run takes O(m) time.

8.2.1. Improvement to weakly polynomial

First, select some upper bound D on the max flow (eg, sum all edges). Run Ford-Fulkerson, only including edges with capacity larger than D. Then cut D in half, repeat. There will clearly be log(D) top-level iterations. What's less clear is that each iteration is improved to m2.

8.2.2. Strongly polynomial: Edmonds-Karp Algorithm

An implementation of Ford-Fulkerson. Instead of picking an arbitrary augmenting path, choose the shortest one (by number of edges, disregarding capacity).

After augmenting on a shortest path, the distance from s to any other node either stays the same or increases. Removing "forward" edges clearly can't decrease the distance. Can adding backwards edges decrease the distance? In general yes, but not if we choose the shortest path. In a shortest path, the partial shortest paths are the shortest paths to their endpoints. Backwards edges are therefore not desirable, because the shortest path to the previous node was achieved using forward edges only.

Idea: Show that each edge will be considered in O(n) many augmenting paths. So there are O(nm) many iterations. And each iteration takes O(m) time (to perform shortest-path search), resulting in an overall runtime of O(m2n).

We associate each augmenting path with an edge, and then show that each edge will only be associated with a relatively small number of augmenting paths. The edge we associate with is (any) bottleneck edge on the path; an edge which, after augmentation, is fully saturated. Call it (u,v). After augmentation, there will be no (u,v) edge; only a (v,u) edge. What would it take for another augmenting path to use this new (v,u) edge? We know the distance to u along the first path was the shortest possible. Call it d(u). The shortest possible distance to v therefore is d(v)≥d(u)+1, since the shortest path to it went through u and then one more edge. The distance to reach u by going through v is d(v)+1, so d'(u)≥d(v)+1≥d(u)+2. That is, every time the (u,v) edge (in either direction) is used as part of an augmenting path, the s-u distance increases by at least two. So the edge can be considered at most |V|/2 times (the maximum distance in a graph with is |V|).

Therefore, O(n) many augmenting paths per edge, O(nm) augmenting paths total, each takes O(m) time, so O(nm2).

8.2.3. Dinic's Algorithm


8.2.4. Link Cut Tree


9. Complexity Classes

9.1. Polynomial relationships between problems

For problems A and B, we say \(A\le_P B\) if A can be polynomially reduced to B: A can be solved using a polynomial number of calls to B, plus a polynomial amount of extra work. We say \(A\le_P^1 B\) if A is solved with a single call to B. Maybe \(\le_P^{O(1)}\) is what we really want.

For example, bipartite matching is \(\le_P^1\) maxflow.

Any polynomial time algorithm is \(\le_P\) any other polynomial time algorithm; you can just ignore the other, and do your algorithm using the extra work available to you.

A modern field called "Fine-grained complexity" studies differences between polynomial time algorithms. But in this course, we don't care.

More interesting examples relate non-polynomial algorithms. Eg, independent set: Is there a k-sized subset of a graph such that no two vertices are connected? Vs Clique: Is there a k-sized subset which is complete (all pairs connected)? Simple: To find independent, find complement of edges (takes |V|2 time), then solve clique. So independent \(\le_P^1\) clique.

Another more interesting example: Determining existance of a vertex cover of size at most k, compared to determining the size of an independent set. Lemma: The complement of an independent set is a vertex cover and vice versa. Every vertex in an independent set connects to a vertex outside that independent set, so the complement is a cover. Further, the complement of any vertex cover is independent: If there is an edge inside the complement, then it would not be in the vertex cover, contradicting that it's a vertex cover.

9.2. NP

Easier to define for yes/no "decision problems". Let X be the set of inputs for which the problem answer is "yes". A problem is in NP if we can create a polynomial-time certification algorithm C(x,t) such that the following are equivalent:

  • \(x\in X\)
  • There exists a certificate t such that \(C(x,t)=yes\).

Example: Maxflow. Of course, maxflow is polynomial, so we can verify it trivially…but there is a constant time way to check as well: Give the cut and the flow, then just check they are equal. We know this happens only for the maxflow.

The "certifier" for a polynomial decision algorithm can be trivial: Just ignore the certificate, run the algo, and check that the answer is equal to x.

There's a more interesting way to think about this "trivial" certifier. Instead, we can think of the certificate being the algorithm itself. Ie, the certifier is being told "Here's the solution. And as a certificate of correctness, I provide you this algorithm which will return the same result and which you can run in polynomial time."

9.2.1. Co-NP

Perhaps surprisingly, there are problems for which polynomial-time-checkable certificates always exist for affirmative results (ie the problem is in NP), but where (at least as far as we know) it is not possibly to come up with certificates checkable in polynomial time for negative results.

In light of this, it is meaningful to define "Co-NP" as the set of problems where we can certify "no" results but not "yes" results. Taking an NP problem and inverting the output turns it into a Co-NP problem.

It's an open question whether NP = Co-NP, ie, whether the existance of certificates for yes implies certificates for no. NP and Co-NP certainly overlap (for example, all of P. Though there are also known examples not in P), but as usual it's hard to show that some certification algorithm does not exist.

9.3. NP-complete

A is NP-hard if \(\forall B\in NP,\ B\le_P A\). Further, A is NP-complete if A is NP-hard and also in NP.

How do you prove something is NP-complete? First, you need a more formal model of computation in order to formally define polynomial time. Perhaps the easiest problem to show NP-completeness for is 3-Sat. If a problem can be solved in polynomial time on a Turing machine, it can be solved using a boolean circuit with a polynomial number of gates. We can turn those gates and the relationship between them into a boolean expression with no more than 3 elements per clause!

Thus, we take the polynomial time verifier/certifier, convert it into a 3-sat style expression, then run 3-sat on it to find an input which satisfies the verifier.

Some NP-complete problems which might show up on the exam:

  • 3-Sat
  • Subset Sum: Is there a subset with a given sum? (equivalent to a fixed sum, say 1)
  • Is a graph 3-colorable?
  • Independent Set (finding a set of vertices of a given size such that no two vertices in the set share an edge)
  • Vertex Cover (set of vertices of a given size such that all edges in the graph touch one of the vertices)
  • Clique (set of vertices of a given size such that every pair in the clique has an edge)
  • Set cover/packing (given list of subsets, find a set of k subsets that covers the entire set)

9.3.1. 3-Sat

Boolean satisfiability means, given a formula using ANDs, ORs, negations, and (possibly repeated) symbols, can we set each symbol to true or false in such a way that the whole expression is consistent? Eg, \((x\wedge y)\vee z\) is satisfiable: Let x and y be true.

Boolean satisfiability is NP-complete.

Conjunctive normal form means that there are ORs in the parenthesis and ANDs between the parenthesis, eg \((a\vee b)\wedge (c\vee \neg d\vee e\vee \neg f)\wedge (g)\).

3-Satisfiability is conjunctive normal form where every parenthesized term has at most three terms. It can be shown that sat reduces polynomially to 3-sat, so we usually deal with 3-sat because of its relative simplicity.

10. Approximate Algorithms

Usually don't approximate decision problems – instead, turn them into related optimization problems and find a solution that's close to the truth.

For vertex cover: Choose an edge arbitrarily, add both edges to the vertex cover, repeat with an uncovered edge until done. Approximation includes at worst twice as many vertices as the minimal cover. Why? For each selected edge, at least one of the endpoints must be selected eventually…we just do both!

For set cover: Given a set of subsets, what's the minimal set of those subsets which covers the overall set? Greedy: At each step, pick the set covering the greatest number of uncovered vertices. Let t denote the actual (optimal) solution. Then, at any point, there must be a choice which covers at least 1/t of the remaining vertices. Suppose there isn't: Then it would take more than t many sets to cover the set of remaining vertices, let alone the entire set! Thus, at each step, we eliminate at least the fraction 1/t of the vertices. Once this value is less than 1, we know we've covered the entire set (if there's no subset remaining with at least one uncovered vertex, there are no more uncovered vertices).

How many steps does that take? Ie, what's the smallest k satisfying \(n(1-\frac{1}{t})^{k+1}<1\)? We have \((1-\frac{1}{t})^{k+1}\le e^{-(k+1)/t}\), and \(ne^{-k/t}<1\). If \(k=t\ln n\), then \(ne^{-k/t}=1\), and increasing k any further causes it to go strictly less than one, so the \(k=t\ln n\): Our approximation algorithm approximates to within a multiplicative factor of ln(n), where n is the size of the entire set.

11. Homework guidelines

When asked to come up with an algorithm, prove:

  • Correctness
    • Termination
  • Runtime

Collaboration is allowed on homeworks, but only for sharing ideas; don't show anyone your written homework, or write it while in a group.

Author: Mark Polyakov

Created: 2022-03-17 Thu 01:04