# CSE 490Q Notes

These are Mark Polyakov's notes from taking Computer Science 490Q, Introduction to Quantum Computing (a special topics course), at the University of Washington during Autumn 2020. I didn't feel very confident about the contents of this course and ultimately chose to focus on my other courses this quarter, so there will be a lot of omissions and maybe even some misconceptions below.

## 1 Motivation & Classes of Computation

### 1.1 Church-Turing Thesis

Any reasonable model of computation is equivalent to lambda calculus and turing machines.

### 1.2 ECTT (Extended Church-Turing Thesis)

(I don't think this one's been proved) not only is everything computable by one computer computable by every other computer (the chuch turing thesis), but everything computable in polynomial time on one reasonable machine is also in polynomial time on other reasonable machines.

### 1.3 BPP

Bounded, Probabilistic Polynomial time. Like P, but the computer can generate truly random numbers and is allowed to produce a probabilistic result (ie, it's not always right). Most algorithms thought to be unique to BPP were eventually reduced to P, so its presumed to be equivalent to P.

### 1.4 CS Affects Physics

If a certain physical system, encodes an NP-complete problem, the the time the physical process takes is predicted to be exponential to the size of the system, for example, because every physical system is expected to be a "reasonable" model of computing.

Another example: Determining how soap bubbles would form seemed to be an NP complete problem. So, by observing the formation of bubbles, we can solve NP problems, right? Turns out that the way the soap bubbles form isn't actually the NP-complete problem that it was suspected to be – the positions or sizes of the soap bubbles or whatever don't exactly match up with the computed solutions, so the problem physics is "solving" likely isn't

### 1.5 Can we prove CS affects physics?

Church and Turing's models were proven to be equivalent because they can simulate each other. We can prove that computation is equivalent to physics by showing that physics can simulate a turing machine and that a turing machine can simulate all physics, in polynomial time.

Turing machines can simulate macro-scale laws in polynomial time (einstein, newton, maxwell, etc), but not quantum stuff – seems exponential in the amount of quantum things being simulated.

### 1.6 Quantum Computing violates ECTT

But, no substantial quantum computer has been built, so we haven't quite disproved the ECTT yet.

### 1.7 BQP

Bounded-error, Quantum-polynomial time. Once again, results are not 100% for sure. Our current understanding of physics suggests that BQP can simulate the universe.

## 2 Computing with Linear Algebra

Linear algebra is not specific to quantum! Also possible for probabilistic computation.

Three ways to represent 1/2 chance of heads, 1/2 chance of tails:

Matrix notation: [1 2]

Multiplication by unit vectors: 1/2*eH + 1/2*eT

Dirac Notation:

​|v> which means column vector, Nx1 (| = many rows, > = one column)

<v| means row vector, 1xN. v doesn't inherently have an orientation; it's simply a vector with 1 in one dimension or the other, and this coerces it into a row vector.

Then, like in arithmetic, things are multiplied by being put next to each other. Eg, <v| |w> means vt times w (assuming they both start as column vectors), or v dot w. <v|w> is shorthand.

​|w><v|, ie w vt, results in a matrix.

Think of data as flowing from right to left. For |v>, a single data point enters on the right, is processed by v, then a vector comes out. For <v|w>, a single point comes out, some weird shit happens in the middle, and a single value comes out. |w><v| turns a vector into a matrix.

Back to the coins: 1/2|eH> + 1/2|eT>.

We can use |1> to indicate the first basis vector [1 0] (as a column) because the special dirac notation disambiguates it from the numeral 1. (When representing random bits, |0> = [1 0] and |1> = [0 1])

So, with coins: 1/2|H> + 1/2|T>

A random bit has some probability of being 1 and some probability of being zero. These probabilities must add to one.

### 2.1 Random Bit Operations

• Swap (x): [ 0 1 \ 1 0 ] swap probabilities of one and zero
• Force 1: [ 0 1 \ 0 1 ] always 1 (second element is one). Essentially reset.
• Force 0: [ 1 0 \ 1 0 ]
• G (hypothetical, not useful): [ 1/2 0 \ 1/2 1 ]. Ie, G|0> = 1/2|0> + 1/2|1> and G|1> = |1>.

Note that any bit operation that can be represented as a matrix properly models the effect of probability from the previous results.

A matrix is a valid bit operation if it keeps the state valid (ie, nonnegative probabilities and sum to one). This can be preserved by keeping all matrix entries positive and having columns sum to 1. All valid matrices follow these requirements:

• If any matrix element is negative, a positive multiplier in that column only will result in that component being negative.
• If any column does not sum to one, a one in the multiplier for that column only will result in an output that does not sum to one.

Furthermore, these requirements always result in valid matrices:

• Any two positives multiplied is positive.
• Split the input vector up into vectors with all zeroes except in one position. When multiplied with the matrix, the sum of the values of the result vector will be equal to the non-zero value of the input vector (if the non-empty position of the vector was 1, the output would sum to 1, so this result comes from linearity). Since the output vectors for the component-wise input vectors sum up (by linearity), and since the components of the input vector sum up to 1, the combined output sums to 1.

Such a matrix is "stochastic"

### 2.2 Multiple random bits

If you have two bits, each combination (eg, 10 or 11) has its own probability – not just a per-bit probability. Eg, it is possible 11 and 00 are valid states even if 10 and 01 are not, so we store four separate probabilities. I'm not sure if non-independent situations will actually be used in this class (I have a hunch they will).

#### 2.2.1 Tensor Product

Symbol: ⮾. Intuitively, if you have two independent probabilistic systems, the tensor product gives a probabilistic description of the combined system. Eg, two separate coins with different probabilities, the tensor product gives the probability of both heads, both tails, first head second tail, etc.

Tensor product of two column vectors of equal height is a column vector with 2*height with elements a1b1, a1b2, …, a2b1, a2b2, …. It's almost a cartesian product!

The tensor product is linear (scalars can be factored out, and distributive over vector addition), which follow almost trivially from the same properties of scalar multiplication. Though the commutation of a tensor product represents the same system, the actual vector returned will be different.

Notation: |x> ⮾ |y> = |x,y>

Tensor product is logically similar to just making an ordered pair of two vectors, but by taking up more "space" it contains information of every possible combination.

#### 2.2.2 Kronecker Product

The tensor product does not actually require the elements of the output vector to be in the order I suggested by my component-wise definition. That's the Kronecker Product. The tensor product doesn't care very much about the order of the new components; it just "contains" an element for each combination of components from the inputs.

#### 2.2.3 Kronecker Product of Matrices/Operations

If F, G are matrices and u, v are vectors, then we want (F⮾G)(u⮾v)=(Fu)⮾(Gv). First note that the 2x2 matrices turn into a 4x4 matrix. It is pretty easy to see how the diagonal values of the matrix are transformed: The element of F that would process u1 into u1new should be multiplied against all components of u⮾v that contain u1 as a factor (by the kronecker product, the first and second components). The values off the diagonal rely on F and G being stochastic matrices; If we want to apply a multiple of a2 to a1, we can determine a2 by simply summing all the components of u⮾v that include it as a factor; $$a_2b_1+a_2b_2=a_2(b_1+b_2)=a_2$$.

In the 2x2 case, this yields F⮾G:

 F11 G11 F11 G12 F12 G11 F12 G12 F11 G21 F11 G22 F12 G21 F12 G22 F21 G11 F21 G12 F22 G11 F22 G12 F21 G21 F21 G22 F22 G21 F22 G22

Although I derived this matrix by thinking about the results I want, note that there is a simple structure: The factors have the "large scale" structure from F and "small-scale" structure from G. This corresponds to the way that many consecutive values in a⮾b share the same coefficient from a, while the coefficient from b changes with every component.

The factors of b repeat themselves in a very small "area", so a small block matrix being repeated throughout the larger matrix describes it. Each small matrix describes how the factors from b move around within the same factor from a. But to swap around values from a requires global knowledge.

How would you construct a matrix for three bits? Apply the Kronecker product associatively. A 2x2 operation matrix ⮾ a 4x4 matrix = 8x8 matrix; the 4x4 matrix is repeated 2x2 times.

#### 2.2.4 Large

Although the matrices get large when you have many probabalistic, you don't need to calculate a fully evaluated kronecker matrix in order to carry out probabalistic computing. If you have 100 (weighted) coins, with say 5 operations on each coin's probability, you only have to perform 500 2x2 matrix multiplications then 100 RNGs, while, not even thinking about the kronecker product of the operations, the tensor product of the probabilities alone would be a column vector 2100 long! Carrying out the computing is sampling the distribution.

When we run a quantum algorithm, the qubits actually have a probability of being ones or being zeroes.

Question (probably to be answered next lecture): I understand that each qubit has a probability of being either one or zero, but the space to store the probabilities per-bit is only n, not 2n. How does quantum take advantage of the 2n probabilities for each combination of possible bit states? Obviously that can't be included in the classical output, so somehow it generates this very large space, does something with it, then extracts the interesting parts?

## 3 Quantum Computing Model: The Basics

The linear algebra computing we've been describing so far is called "BPP". Stochastic matrices have columns that sum to 1, and the sum of the components of a valid state sum to 1.

In BQP, the components of state vectors and operation matrices are complex. They can have negative real parts, even. And rather than requiring that columns sum to 1, we require that the square of the magnitudes in each column sum to 1. This is the 2-norm, while the BPP uses the 1-norm (sum). We don't square root the result because 12=1.

A qubit has two complex numbers whose 2-norm is 1.

In dirac notation, 〈x| is the complex conjugate and transpose of |x〉. A † symbol (dagger) is an explicit way to transpose and conjugate a matrix. 〈x|x〉 is $$\bar \textbf{x}$$ dot $$\textbf{x}$$. Recall that $$(x)(\bar x)=|x|^2$$, so 〈x|x〉=|x|2.

Rather than a "stochastic" matrix, we call quantum operation matrices "unitary matrices". TFAE:

• U is a unitary matrix.
• | u|x〉| = | |x〉 | (for any p-norm)
• Ut U = Identity
• Any eigenvalue of U has norm/magnitude 1. If it were not so, then an eigenvector would have the sum of its probabilities be changed!
• |det(U)|=1

I think stochastic matrices (1-norm) are also unitary? Are those defintiions equivalent if higher norms are used, too?

The definition of unitary matrices only makes complete sense when they are defined over the complexnumbers. So they are not "stochastic".

The second and third are equivalent becuase $||U|x\rangle ||^2=(U|x\rangle)^\dagger(U|x\rangle)=\langle x|U^\dagger U|x\rangle=\langle x|x \rangle=||x||^2$ Which is 1 if $$x$$ is a valid state vector to begin with.

The tensor product of two quantum states is a valid quantum state. $||a \otimes b||^2 = ||[a_1b_1, a_1b_2, ..., a_2b_1, a_2b_2, ...]||^2 = (a_1b_1)^*(a_1b_1)+... = a_1^2(b_1^*b_1+b_2^*b_2+...)+a_2^2(...)+... = (a_1^2+a_2^2+...)(b_1^*b_1+b_2^*b_2+...)=||a||^2||b||^2$ TODO: something's wrong with the $a$s at the end.

To show that $$U\otimes V$$ is a unitary matrix for unitary matrices U and V, show that $$(U \otimes V)^\dagger(U\otimes V)=(U^\dagger U)\otimes(V^\dagger V)=I$$, one of the equivalent definitions of a unitary matrix.

Property: $$(a \otimes b)^\dagger=a^\dagger \otimes b^\dagger$$.

### 3.1 Realizations of Qubits

• Polarizing photons: They polarization direction consists of two complex numbers. A complex system of smoke and mirrors does stuff to compute with them.
• Electron Spin: Direction also described with multiple complex numbers.
• Certain systems (harmonic oscillators?) exist in multiple energy states, with complex numbers describing the "combination" of the energy states that it's currently in.

## 4 Quantum Computing Model: Continued

If $$U$$ is unitary, so is $$U^\dagger$$. $$U^\dagger$$ is the inverse of $$U$$, because $$U^\dagger U=I$$! (I should've noticed that). This is not usually the case for a stochastic (1-norm) matrix. For example, [ .5 .5 \\ .5 .5] is clearly singular. . If you know the output state of a quantum computer, and the operations that were carried out, you can transpose all the operations and multiply to get the input state. But if quantum computers are reversible, and classical computers are not, are quantum computers more restricted?

But we can reverse classical computation by…just storing the inputs to each stage permanently? I'm pretty sure that's cheating.

When we carry out probabilistic computing, our matrices tell us probabilities, but the output bits are in definite states. We run it many times to determine the probabilities. But a quantum computation results in bits that actually contain the 0 and 1 states to different extents; they're in superposition. We coerce it to a bit with probability of 1 equal to q|1〉 and probability of 0 equal to q|0〉, where q is the qubit's state.

A diagonal matrix with all diagonal values being $$e^{i\theta}$$ is unitary. This "rotates" all the qubits, but the magnitudes of their |0〉 and |1〉s stay the same. It would be fair to ask, what's the point of phase at all if it doesn't affect magnitude? Maybe answer: The phase difference between |1〉 and |0〉 matters, but not the "bias" phase they both share. But we just don't know what that phase difference is used for.

Partial Measurement: We can observe certain qubits without others. You can't observe only certain combinations of qubits (eg, it doesn't make sense to measure |01〉 only – it would tell you things about 00, 10, and 11 as well – so you'd just measure both qubits).

If you have a vector of multi-qubit state, how do you model a single-qubit measurement?

1. Determine the probability that a certain qubit will be measured as one or zero by factoring the tensor product. Eg, if we have a state of 2 qubits with |00>, |01>, |10>, and |11>, we can turn it into the form |0>⮾|x>+|1>⮾|y> for some x and y single-qubit states, using simple rules of distributivity and stuff over tensor product.
2. Determine what the resulting state will be when one of the qubits is measured as a certain value. From the equation above, say that the first bit is observed as 1. Then the state of the other bits is certainly related to |y>, but |y> on its own is not a valid state vector and must be normalized by uniformly scaling up its components until the sum of their squares is 1 ($$|y\rangle * 1/|y|$$, where the norm is the square root of the sum of the squares of the components of y. The sum of the squares of the partial state add up to $$|y|^2$$, so we want to increase this sum by $$1/|y|^2$$, which is achieved by multiplying each component by $$1/|y|$$, because this causes the square of each component to be multiplied by $$1/|y|^2$$, which we can then factor out). This does not violate any rules when you start manipulating the scaled y as a complete state; all our theorems depend only on 2-norm magnitude of the state, not any specific dependencies between the individual components.

## 5 Product States & Entanglement

Many multi-qubit states are not "independent" combinations of qubits. When a multi-qubit state is a combination of independent probabilities, it is a "product state". In these cases, we don't need to store 2n complex numbers – just n, then we could do tensor product later. But in "entangled states", we need 2n. This is quantum!

Einstein, Podolski, and Rosen studied the aptly named "EPR States" such as .707|00〉

• .707|11〉, which is one of many "Bell States" (maximally entangled states). .707|10〉
• .707|01〉 is another Bell state.

How do we operate on an initial, non-entangled state to create an entangled state? We obviously cannot tensor together two operation matrices, (U⮾V)(a⮾b)=(Ua)⮾(Ub) which is a product state. One of the simplest examples: CNOT, toggle the second bit if the first bit is 1. But the operation carried out on one bit is dependent on the state of another bit, so the operation matrix is clearly not a tensor product. The matrix:

 1 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0

If the input vector is [ 00, 01, 10, 11 ]

When developing intuition for the Kronecker product of operations, we said that the operation on the right hand of the matrix only needs local information, hence why it is duplicated as a block matrix throughout the product matrix. But breaking this CNOT matrix into blocks, it is clear they are not scalar multiples – the effect is only achieved with global knowledge.

Previously, we talked about how to transform a state if it is partially observed (ie, some but not all qubits are observed). We were able to "factor out" the observed bit using the tensor product to determine what the other bits should be. And, interestingly, this factoring is still possible; recall we were able to perform the factoring without any information about how the state was created from tensor products, so it need not be created from tensor products.

Eg, for .707|00〉+.707|11〉, we decompose into .707|0〉⮾|0〉+.707|1〉⮾|1〉. See how tensor product here doesn't mean anything about independence, because there's a separate tensor product for each possible state of the first bit.

"Marginal" Distribution: We can determine the probability distribution for a single qubit pretty simply, by adding the complex numbers of all states where that qubit is 0, and making that the "probability" of it being zero, and proceed similarly for 1. In fact, we have to do this as part of our "decomposition" used in modeling a single qubit measurement. Then he said this doesn't work for qubits?

Recall that qubits actually store the probability distribution itself, and are not "actually" one nor zero. So if you have an entangled pair, then observe one half of it, the other half might be resolved into one state or the other, or the probability distribution might change.

When you look at part of an entangled state, it acts like its in a probability distribution between the different states that part might be in, depending on what happened to the other side.

## 6 The Bloch Sphere

Allows full visualization of how one-y or zero-y a qubit is, as well as the phase difference between the one and zero components. Note that (as long as we don't care about global phase), any qubit can be represented by the following equation: $\cos(\theta/2)|0\rangle+e^{i\varphi}\cdot\sin(\theta/2)|1\rangle$ Note that this qubit is valid because $$\sin^2 x+\cos^2 x=1$$ and $$\|e^{i\varphi}\|=1$$.

Then, we define the Bloch sphere like so: Start with a vector pointing straight up. Let the x-axis point out of the page, the y-axis point to the right, and the z-axis point up. Let $$|0\rangle$$ be at (0,0,1). Then rotate it around the y-axis (so at first it comes toward you) by θ. So if the bit is $$|0\rangle$$, don't move it at all, but if it is $$|1\rangle$$, it points straight down (0,0,-1). Then, rotate it around the z-axis by ϕ, so that at first it goes right.

• |0> = (0,0,1)
• |1> = (0,0,-1)
• -|0> = (0,0,1) (phase 180, but who cares)
• -|1> = (0,0,-1) (phase 180, but who cares)
• $$1/\sqrt2\cdot(|0\rangle+|1\rangle)$$ = (1,0,0)
• $$1/\sqrt2\cdot(|0\rangle-|1\rangle)$$ = (-1,0,0) (phase 180)

Basic transforms in terms of what they do to the Bloch sphere:

• Z=[ 1 0 \\ 0 -1 ]: Inverts the second term, i.e, adds 180 to phase, inverting both x and y coordinates.
• X=[ 0 1 \\ 1 0 ]: Inverts the z coordinate (flips around xy-plane) (not gate)
• H=1/sqrt(2)*[ 1 1 \\ 1 -1 ]: Rotates 90 degrees around y-axis, then flips over x-y

### 6.1 Global Phase

It's so easy to get confused. After all, |1> and -|1> seem to be the same apart from a global phase, but |0>+|1> (i.e., |+>) is not the same as |0>-|1> (i.e., |->). So we cannot just interchange parts of the state, even if it seems like those parts are the same except for global state.

You see, global phase has to do with measurement. No matter how much you manipulate a state, you cannot detect anything about its global phase. But if you look at a superset of this state, with additional qubits, then you can recover the "global" phase of the little part you were looking at. So when performing calculations on some set of qubits that are a subset of the state you might have to consider later, you should not discard phase information.

More specifically, it's safe to ignore the global phase of a state that is not entangled with anything that you care about. So if you have |a>⮾|b>, the global phase of |a> and |b> can be safely ignored, because it will turn into a global phase on the entire state when you expand the tensor product (tensor product is linear to scalar multiplication, so we can factor the phase out of the product).

## 7 Matrices from Lecture

F1:

 0 0 1 1

G:

 1/2 0 1/2 1

X (not):

 0 1 1 0

## 8 Notes for [2020-10-14 Wed], Product States II

The "more mysterious" parts of Quantum Mechanics. Oh boy.

A product state can be represented as a tensor product of some number of qubits. The qubits are not entangled whatsoever.

An entangled state has some probability connections between qubits. Without entangled states, you could just store each qubit separately in polynomial space on a classical system.

An EPR pair is $$\frac{1}{\sqrt2}|00\rangle+\frac{1}{\sqrt2}|11\rangle$$, one of the maximally engangled Bell States, because if you measure one qubit, it gives you 100% of the information about the state of another qubit. More accurately, neither qubit can be described on its own using only a quantum state.

That doesn't mean entangled qubits can't be described. Recall that a qubit state isn't a set of probabilities; the qubit, before its measured, actually has those coefficients tucked away somewhere, in their full precision. When you try to describe a single qubit $$a$$ that's entangled with qubit $$b$$, the state of $$a$$ depends on what happens when you measure $$b$$. But what happens when you measure $$b$$ is probabilistic. Thus we need a probability distribution between two different states of $$a$$, one state that will occur if $$b$$ is measured as zero, and another if $$b$$ is measured as one.

Although we think about describing the state of an entangled qubit as a "probability distribution between quantum states", the math is a bit simpler; since quantum states appear probabilistic when we measure them, we can combine the "quantum state" part of the probability and the "entanglement probability distribution" into a single probability. See slide 5 for an example of the math. It's done mainly by "factoring out" the qubit of interest from all the others: $\sum_{\mathrm{all\ states\ of\ other\ qubits}}(\mathrm{state\ of\ all\ other\ qubits})\otimes(a|0\rangle+b|1\rangle)$, where $$a$$ and $$b$$ will differ depending on the current "state of all other qubits". The probability of the state $$a|0\rangle+b|1\rangle$$, then, is the amplitude of the "state of all other qubits" (which might be, eg, $$|0111001100\rangle$$). You might need tensor products on both sides of the qubit in question (here I assumed the qubit was the "last" one in the state), but the technique is the same.

Nothing "happens" to one entangled qubit when you mess with the other. That's the way the probability distributions work. That being said, if you measure one qubit of an entangled qubit, it does give you more information about the other qubit – but information about the way it already was.

It's possible to apply a single-bit operation to an entangled bit. It's equivalent to applying that operation matrix, tensor product'ed with identities on both sides, to the entire state (ie, the same thing as applying an operation to an un-entangled state).

A mixed state contains qubits that are entangled with qubits outside that mixed state. Put differently, "mixed state" means "probability distribution between quantum states". Any other state is a pure state, which I think is the same as a product state.

The monogamy of entanglement states that if two qubits are maximally entangled, neither can be entangled whatsoever with anything other qubit. If you and a friend have maximally entangled qubits, you know that nobody else can have a qubit entangled with them. Consider the "GHZ" state: $$\frac{1}{\sqrt2}|000\rangle+\frac{1}{\sqrt2}|111\rangle$$. If you measure any one bit, you end up with a probability distribution on the other two: 50% chance of measuring $$|00\rangle$$ and 50% chance of measuring $$|11\rangle$$. This is not a "bell state"; it's a superposition of two product states.

Any quantum operation that can turn a product state into a mixed state (i.e., it can entangle qubits) cannot be written as a tensor product of one-qubit unitary operations. The interesting operations are the non-product ones.

When talking about quantum operation matrices, we will restrict ourselves to matrices smaller than some certain size (eg, $$10\times10$$, though a matrix that operates on only 2 qubits at once is enough to do anything you can do with larger ones), otherwise constructing those matrices could be expensive, since the number of elements grows as $$2^n$$ with the number of qubits, and it's physically very difficult to construct a quantum computer that acts on many qubits at once.

## 9 Quantum Circuits

A "classical circuit" is stuff like AND, OR, NOT, etc. All classical circuits can be build using just NAND gates, and similarly, there's a set of quantum operations that can do anything. One of the simplest (but not the smallest): The union of all possible 1-qubit operations and CNOT.

There are finitely many (two) reversible one-bit classical operations: NOT and identity. But there are clearly infinitely many one-qubit operations, since you can rotate by arbitrary amounts, for example.

The {one-qubit stuff, CNOT} set is actually implemented with Ion Traps! Single-qubit operations are implemented by firing a laser at a specific atom at some frequency that changes its energy state, sometimes. To CNOT a couple atoms, a laser is fired at one in such a way that, since the ions are levitated in a magnetic or electric field, to jiggle back and forth. The electric interactions between the ions cause all of them to jiggle based on the jiggle of the first one. Then a laser is fired at the second atom, and since whether the laser hits the atom straight on is dependent on the jiggling of the atom, which is turn dependent on the jiggling of the first atom, the state of the second one becomes entangled with the first. Apparently this is sorta sketchy so people are moving away from Ion Traps these days.

Qubits have two degrees of freedom; Given the amplitude and phase of either $$|0\rangle$$ or $$|1\rangle$$, we can figure out the amplitude of the other (norm=1) and we don't care about the phase of the other one, since we know the local phase.

We can restrict 1-qubit operations to H and T and use them to approximate (not exactly equal? due to finitely many multiplications?) any 1-qubit operation.

For efficiency calculations, we'll assume that, if we limit ourselves to a certain maximum size of matrix, any unitary operations using a matrix within that size takes a fixed amount of time.

A physical quantum computer might, for example, need two qubits to be next to each other in order to CNOT them. But that's a compiler level thing: If you try to CNOT two far-apart qubits, the compiler will insert swaps to get them locally before applying CNOT.

We know the mixed state of 50% chance of $$|0\rangle$$ and 50% chance of $$|1\rangle$$ is the same, under observation, as a single qubit state $$\frac{1}{\sqrt2}|0\rangle+\frac{1}{\sqrt2}|1\rangle$$. However, there are certain unitaries we can apply to the qubit that will allow us to see the difference. If we apply the Hadamard gate, then the mixed state will turn into $$\frac{1}{\sqrt2}|0\rangle+\frac{1}{\sqrt2}|1\rangle$$, because both $$|0\rangle$$ and $$|1\rangle$$ turn into that. Under observation it has equal likelihood of 1 and 0. However, if we apply Hadamard to the pure state, we will get $$|0\rangle$$, with no 1 component at all, and when measured it will certainly be 0!

For reference: $H|0\rangle=\frac{1}{\sqrt2}(|0\rangle+|1\rangle)$ $H|1\rangle=\frac{1}{\sqrt2}(|0\rangle-|1\rangle)$

Why is Hadamard considered so important? Does it "care" about the phase or something?

Since we know "any 1-qubit operation + CNOT" is a universal instruction set, we can prove that a different instruction set is universal by showing that it can perform any 1-qubit operation and CNOT.

$$|+\rangle=\frac{1}{\sqrt2}(|0\rangle+|1\rangle)$$ and $$|-\rangle=\frac{1}{\sqrt2}(|0\rangle-|1\rangle)$$. These are orthogonal and form a basis for 1-qubit states. So now we have two common bases for a qubit state; We call the 0 and 1 the Z-basis, and the + and - the X-basis.

### 9.1 Common Gates

#### 9.1.1 Z

 1 0 0 -1

Rotates 180 degrees aronud the z-axis of the Bloch sphere. Does not affect measurement probabilities.

$$Z|0\rangle=|0\rangle$$ and $$Z|1\rangle=-|1\rangle$$.

$$Z|+\rangle=|-\rangle$$ and $$Z|-\rangle=Z|+\rangle$$ (follows quickly from distributive prop of matrix multiplication).

In other words, $$Z$$ swaps $$|+\rangle$$ and $$|-\rangle$$ and flips the phase between $$|0\rangle$$ and $$|1\rangle$$.

#### 9.1.2 X

 0 1 1 0

Flips around xy-axis of the Bloch sphere.

$$X|0\rangle=|1\rangle$$ and $$X|1\rangle=|0\rangle$$.

$$X|+\rangle=|+\rangle$$ and $$X|-\rangle=-|-\rangle$$.

In other words, $$X$$ swaps $$|0\rangle$$ and $$|1\rangle$$ and flips the phase between $$|-\rangle$$ and $$|+\rangle$$.

#### 9.1.3 H

Swaps between the Z-basis and the X-basis! It's pretty obvious that it turns 0 and 1 into + and -, respectively. But it also turns + and - into 1 and 0: H|+〉=[1 1]+[1 -1]=[1 0] and H|-〉= [1 1]+[-1 1] = [0 1].

H2=1 also: Swaps basis then swaps back again.

Z=H X H because we are just performing a swap in a different basis. Clearly X=H Z H as well.

#### 9.1.4 CNOT

Also called CX'' because it applies X to the second bit when the first bit is 1.

It may seem like CNOT is "one-way" – it "reads" the first bit and "sets" the second bit. This isn't true. If you H(0) then CNOT(0,1), you end up with |00〉+|11〉. This state is clearly symmetrical with regrads to the two qubits. So you can't tell which bit the H and CNOT were applied to! This is why certain combinations of H and CNOT are equivalent to other combinatinos where the CNOT is reversed, while a "controlled not" on classical qubits would always be asymmetrical.

#### 9.1.5 CZ

Like CNOT, but applies Z to the second bit, ie flipping it in the X-basis.

Hadamard the second bit, then apply CNOT, then hadamard it again. Expressing this formally for a CNOT that acts on two adjacent bits: $CZ=(I\otimes H)CX(I\otimes H)$ And the same thing holds if we switch CZ and CX.

#### 9.1.6 SWAP

Keeps 00 and 11 the same, swaps 01 and 10.

SWAP*CZ*SWAP=CZ. Huh.

SWAP(U⮾U)SWAP=U⮾U

### 9.2 Gate Equivalences

SWAP(U⮾I)=(I⮾U)SWAP, and SWAP(I⮾U)SWAP=(U⮾I) (personal, i hope it's right)

(I⮾H)*CNOT(0,1)*H=CZ(0,1): Because H*X*H=Z, so we can do it conditionally as well.

$$(H\otimes H)CNOT_{1,2}(H\otimes H)=CNOT_{2,1}$$. But wait, doesn't CNOT12 leave the first bit unchanged?? How can hadamards change which bit is changed and which one is unchanged? Recall that CNOT=CX, and that X only causes a phase change in the X-basis.

### 9.3 Aside: Multiplying rank-one matrices

It's often convenient to write a complex matrix in terms of input-output pairs that we want it to have, rather than element-wise. For example, X can be written $$|1\rangle\langle0|+|0\rangle\langle1|$$. This clearly works when each "rank-one matrix" has only a single one in it, i.e., it's "selecting" a specific row and column to insert a 1 into. But, we can also write a Hadamard matrix like this: $$|-\rangle\langle0|+|+\rangle\langle1|$$, even though |-〉 and |+〉 have multiple nonzero entries. When is it acceptable to write a matrix as a sum like this, when the factors of each addend are not trivial? It's acceptable when the input vectors (the row vectors) are orthogonal to each other.

When the row vectors are orthogonal, any vector being multiplied on the right can be written as a sum of multiples of those orthogonal row vectors, plus perhaps some other orthogonal vectors that weren't included in the sum (those parts of the input vector simply won't affect the output).

For Hadamard specifically, we are lucky that $$|-\rangle\langle0|+|+\rangle\langle1| = |0\rangle\langle-|+|1\rangle\langle+|$$, but in general there is no commutativity rule for this.

### 9.4 Diagrams for Quantum Circuits

Much like traditional logic circuits with or, and, xor, etc. All gates have an equal number of inputs and outputs.

## 10 Quantum Information

You cannot copy a qubit! I.e., |x〉⮾|0〉 cannot be taken to |x〉⮾|x〉 for arbitrary x by a unitary matrix. Any operation matrix must work on the computational basis states TODO

We can make a specific unitary that can copy a state that's one of two orthogonal states (change basis, CNOT (which is "copy" for 0 and 1), then change back). But it won't work on anything else.

### 10.1 Quantum Teleportation

If you entangle qubits A and B, then separate them, then perform some quantum operations on B and C (which are local), then transmit some information classically to a person with the ability to perform quantum operations on A, it's possible to perform A:=C. Doing this wipes the state from C, so it's still not quantum cloning, but still, it's possible to teleport quantum state over a distance with just a classical communication scheme!

But how does this not violate the speed of light? Well, we don't actually change the measurement probabilities for A, is my understanding. Rather, we change the probability distribution of states on C without changing the overall measurement probabilities.

First, we setup bits 1 and 2 as an EPR pair, by H'ing one then CNOT'ing. Next, we

## 11 Quantum Algorithms

### 11.1 Deutch-Josza

Given an "oracle" function that takes n input bits and returns either one or zero. We know it's either constant, i.e. it doesn't care about the input bits, or it's balanced, i.e. half of the total input space causes it to return 1 and half of it causes it to return 0.

Classically, you'd have to try at least half of the inputs to determine with any certainty whether the function is constant or not. If you measure less than half, and they're all the same, you still can't be sure if it's balanced or not until you try the n/2+1'th input.

I.e., for n bits, the classical algorithm needs 2n/2+1 queries.

Clearly any arbitrary f is not going to be reversible, so we need to modify it a bit to use it in quantum. It just takes |x,0> to $$|x,0\oplus f(x)\rangle$$, so reversing this tells us nothing about f. This must be implemented via a unitary matrix.

Idk when he proved this, but apparently for any reversible operation there is a unitary/circuit that can perform it?

#### 11.1.1 The Circuit

Hadamard all the bits of x, the input to f. We have one additional bit, which stars as |1>. Hadamard all the bits, including the extra one. Run the oracle, with that extra bit as the "output" bit. Hadamard all the gates that were used as inputs, and then measure them. (I guess the oracle entangles all those input bits with the output a bit?)

When we take the tensor product of multiple |+> states, the resulting state is an equal superposition of every possible basis state however many bits.

So, right before we run the oracle, we have (equal superposition of n bits)⮾|-〉.

When the oracle yields zero, |-> stays the same (XOR with zero). When the oracle yields one, |-> becomes -|-> (like applying X, which phases in this basis). So the probability of the last bit being positive vs negative is equal to the probability of f being 1 or 0 throughout the entire input space. BUT, you'd have to measure that bit many times to determine with certainty that f is balanced or not, probably so many times that the probability of you correctly determining the nature of f would be the same as if you spent an equivalent amount of time giving f random inputs classically. So we have to do stuff to the input bits again after the unitary.

When we hadamard them all again, they should turn into their original states.

The most straightforward way to think about H is in terms of what output states each input state creates. Eg, to say that HHH|000> creates an equal superposition of all three-qubit states. But it's just as important to be able to say how each output state is created in terms of input states.

Applying H⮾n to any n-qubit state creates an equal-probability distribution of all n-qubit states. We must consider the phases, though. |000> gets mapped to all positive. |100> gets mapped to positive states when the first bit is zero, and negative states when the first bit is one.

Each output state is calculated by iterating over the input states and determining whether they contribute or detract from this output state. ∑nil

Define two gates:

• SUM(|a> |b> |c>) = |a> |b> |a+b+c>
• CARRY(|a>|b>|c>|d>) = |a>|b>|c+b>|d+bc+a(c+b)>=|a>|b>|c+b>|d+bc+ac+ab>. In English:

• |a> is the carry input, left unchanged.
• |b> is an addend, left unchanged.
• |c> is another addend, which is mutated to b+c.
• |d> is the carry output, set when at least two of the three adding inputs were nonzero.

The carry operation correctly calculates the carry bit for the next stage, but it doesn't properly perform the addition, because the third qubit is set to c+b, disregarding the last stage's carry bit.

Now, for the strategy:

1. Use CARRY successively on each pair of input bits to calculate all carry bits.
2. Use SUMs to add the carry bit input to each stage to the c+b already generated by the CARRY, turning c+b into a+c+b

In practice, after each SUM, we apply the inverse of CARRY to reset the bit we just used to zero. It's much simpler than any measurement based reset approach.

## 12 Hamiltonian Stuff

What does exp(M) mean for a matrix M? Since exp has a Taylor series that converges, and taylor series (centered at zero) only requires integer exponentiation of scalar multiplication of its argument, we could try to pass in a matrix instead of a scalar. If the matrix in question is Hermitian, it is symmetric, and then by the spectral theorem it has an orthonormal eigenbasis (it is diagonalizable). Now we can substitute into the expression for the Taylor polynomial of ex:

\begin{align*} e^x&=a_0x^0+a_1x^1+a_2x^2+\dots \\ &= a_0M^0+a_1M^1+a_2M^2+\dots \\ &= a_0(PD^0P^\dagger)+a_1(PD^1P^\dagger)+a_2(PD^2P^\dagger)+\dots \\ &= P(a_0D^0+a_1D^1+\dots)P^{-1} \\ &= P(\mathrm{diag}(a_0\lambda_1^0,a_0\lambda_2^0,\dots)+\mathrm{diag}(a_1\lambda_1^1,a_1\lambda_2^1,\dots)+\dots)P^{-1} \\ &= P(\mathrm{diag}((a_0\lambda_1^0+a_1\lambda_1^1+a_2\lambda_1^2+...) + (a_0\lambda_2^0 + a_1\lambda_2^1+\dots) + \dots))P^{-1} \\ &= P(e^{\lambda_1}+e^{\lambda_2}+e^{\lambda_3}+\dots)P^{-1} \end{align*}

I.e, it's the matrix with exp applied to all of its eigenvalues!

Exponential growth or decay in the real world is modeled by the differential equation $$f'(t)=rf(t)$$ with the solution $$f(t)=e^{rt}C$$. Well, apparently "evolution" in the quantum world is described by Schrödinger's equation, $$|x'(t)\rangle=-iH|x(t)\rangle$$, which has the solution $$|x(t)\rangle=e^{-iHt}|C\rangle$$ (notice how the constant is a vector). We want $$|x(t)\rangle$$ to always be a quantum state. Since |C> is a valid state, this occurs when $$e^{-iHt}$$ is unitary. Which means the eigenvalues of $$e^{-iHt}$$ must all be one. The eigenvalues are of the form $$e^{-i\lambda}$$ for each λ eigenvalue of H, and $$|e^{-i\lambda}|=1$$ exactly when λ is real, because then it's an Euler's formula thing.

A diagonalizable matrix has real eigenvalues exactly when $$H^\dagger=H$$, i.e. H is Hermitian. Because $$H=P\mathrm{diag}(\lambda_1, \lambda_2,\dots)P^\dagger$$ (how do we know P is unitary?) so $$H^\dagger=(P^\dagger)^\dagger\mathrm{diag}(\lambda_1^*,\lambda_2^*,\dots)P^\dagger$$ which is the same as H iff the eigenvalues are real.

$$e^{-iH}$$ is unitary exactly when H is Hermitian. All unitaries can be turned into a Hamlitonian, so the two representations are equiv. Given a unitary, we can find the Hamiltonian by looking at the eigenvalues of U. Each eigenvalue $$\mu_j$$ is equal to $$e^{-i\lambda_j}$$ for one of the eigenvalues of H. Furthermore, each $$|u_j|=1$$, so $$u_j=e^{it}$$ for some t. This yields $$e^{it}=e^{-i\lambda_j}$$, so $$i\log t=-i\log \lambda_j$$

If $$U=e^{-iH}$$, then $$\log U=-iH$$ or $$H=i\log U$$. The logarithm of a matrix is defined similarly to the exponential (in terms of the Taylor polynomial).

## 13 Q#

### 13.1using statements

Like with-qubits; they allocate qubits, you can do stuff with them, then they're released. A qubit used in an earlier using statement may be re-used in a later one, so it's important to leave them in a consistent state – this can be done with Reset(q), though it's recommended to do it manually.

borrowing takes a qubit that may not start in the |0> state. It might re-use qubits from a surrounding using, for example. The qubits should be left as they were found.

using ( (q1, q2) = (Qubit, Qubit()) ) {
H(q1);
}


### 13.2 Types

• Qubit: A qubit, not measured.
• Result: A measured result, one of One and Zero.
• Int
• BigInt
• Double
• Bool: true or false
• String
• Unit: like nil, always ()
• Pauli: An operator itself, Pauli{I,X,Y,Z}

Tuples are allowed and with nesting: (Result, (Int, Int)). Single element tuples are equal to their constituent.

Arrays are allowed and may be multidimensional. Arrays of tuples and stuff are fine. There is no need for these to be "rectangular"; i.e., arr may have a different length than arr for 2d arr.

## 14 Questions

• How is our "reversible computation" different than just storing all the arguments?
• If protein computing does not include effects from quantum physics, why do we need quantum computers to simulate it? If we could observe protein folding well, could we use it to compute?
• Why use complex numbers if we only care about magnitudes?
• Nature must "measure" a qubit when it is destroyed, otherwise it would be possible to use it for faster than light communication, or so Kevin said. But then wouldn't nature have to measure it a soon as it gets any distance away from you? I.e, it could be used for faster than light communication across a room. Nature "has" to measure it immediately?
• A more precise version of the above question: Does nature internally "measure" the qubits as soon as they're separated, so that, although both qubits are described by a probability distribution of states, those probabilities are depedent? I.e, if I measure $$a$$ while $$b$$ is on the way to a black hole, the state of $$a$$ before I measured it was the state corresponding to the actual version of $$b$$ if it were measured? So, i.e., I can tell what will happen when $$b$$ is measured after measuring $$a$$, or vice versa (assuming the probabilities are 1/0 so no ambiguity).
• What's really wrong with arbitrarily large quantum operation matrices? If locality is not restricted to 2 bits, there's not really a hard limit, right? And is the "solving the halting problem" really a real problem? Isn't the problem on determining the columns of that matrix, rather than using it as a quantum operation? After all we could actually construct such a matrix for, eg, 8 byte Java programs, if there was a halting detection algorithm. Or is the purpose of the size restriction on operation matrices simply to prevent defining a matrix as a function? But there's a limit at the number of qubits in the system anyway, unlike a turing machine with unlimited tape…
• How can we possibly do interesting operations with only a single qubit? Couldn't we just multiply all the unitaries then do it classically? Or does "useful" mean "comparable to classical computers" rather than "faster than classical computers", since you can do matrix multiplication with a single qubit?

Created: 2021-01-04 Mon 13:44

Validate