The Symmetry at the Heart of the Canonical Commutation Relation
The canonical commutation relation between the position operator $\hat{{}x}$ and momentum operator $\hat{{}p}$ is one of the most fundamental equations of quantum mechanics:
$$ [\hat{{}x}, \hat{{}p}] = i \hbar. $$
It’s what implies that you can’t precisely measure the position and momentum of a particle simultaneously. But where does it really come from? In this lesson, I want to explain the quantum origins of this equation based on symmetry principles.
Now, the world of quantum mechanics is very different from the classical mechanics that we’re all much more accustomed to. And we can’t derive quantum mechanics from classical laws like $F = ma$. Quite the opposite: it’s quantum mechanics that is the more fundamental theory, and classical mechanics emerges from it.
But there are close parallels between many of the equations of quantum and classical mechanics, as I’ve told you about in the last couple of mini-lessons. For example, we’ve seen that the quantum commutator plays a similar role as a classical operation called the Poisson bracket, up to a factor of $i \hbar$:
$$ \{\cdot, \cdot \} \to \frac{1}{i \hbar} [\cdot, \cdot]. $$
In particular, the Poisson bracket of the position $x$ and momentum $p$ in classical mechanics is $\{x, p\} = 1.$ And if we apply the rule to turn this Poisson bracket into a commutator bracket divided by $i\hbar$ we indeed get the canonical commutation relation, $[\hat{{}x}, \hat{{}p}] = i \hbar.$
But in this lesson I want to do better than just replacing curly brackets with square brackets and declaring voila. I want to show you how the canonical commutation relation emerges from the symmetry principle we recently discussed: that momentum is the generator of spatial translations.
What we showed is that in classical mechanics the momentum defines a transformation that picks up our system and slides it over in space. And if this spatial translation is a symmetry, then the momentum is a conserved quantity.
I’m going to explain the quantum version of that same statement in this lesson, and show you how it essentially defines what we mean by momentum in quantum mechanics, and leads inevitably to the canonical commutation relation. To do that, I’m going to have to start with a whirlwind tour of the basics of quantum mechanics that we’ll need. If you haven’t seen much of it before, things are going to look a little strange, but I still think you’ll get a lot out of it, so stick with me! I’ll be making more lessons in time that flesh out all of these ideas.
In quantum mechanics, the state of a system, like a particle or atom or whatever else, is described by the state vector $|\psi\rangle$. $\psi$ is the Greek letter psi, and $|\cdot\rangle$ is the notation we usually use for vectors in quantum mechanics, called a “ket”. It’s a generalization of an ordinary column vector,
$$ \vec{{}V}=\begin{pmatrix} V_1 \\ V_2 \\ \vdots \end{pmatrix}. $$
The state vector $|\psi\rangle$ contains all the information we can get about our particle. The things we measure, like its position or momentum, say, correspond to operators that act on the state—$\hat{{}x}$ for the position operator and $\hat{{}p}$ for the momentum operator, where I’ll use a hat symbol to indicate the operator corresponding to a given quantity:
$$ \text{Position }\to\hat{{}x}|\psi\rangle, \quad \quad\text{Momentum} \to\hat{{}p} |\psi\rangle. $$
Whereas the state $|\psi\rangle$ was analogous to a column vector, an operator is analogous to a matrix. It acts on a state and gives you a new state, similar to how a matrix can multiply a column vector and give you another vector:
$$ \begin{pmatrix} & & \\ &\ddots & \\ & & \end{pmatrix} \begin{pmatrix} \\ \vdots \\ \\ \end{pmatrix} = \begin{pmatrix}\\ \vdots \\ \\ \end{pmatrix}. $$
You can also act multiple operators on a state, like $\hat{{}x} \hat{{}p} |\psi\rangle$, but in general they don’t commute if we switch the order, meaning that $\hat{{}x} \hat{{}p}$ and $\hat{{}p} \hat{{}x}$ do different things. Their failure to commute is quantified by their commutator, defined by $[\hat{{}x},\hat{{}p}] = \hat{{}x} \hat{{}p} - \hat{{}p} \hat{{}x}$. The goal of this lesson is to show you how symmetry dictates that this particular commutator is just a number, $i\hbar$.
Another operation on regular old vectors that you’re probably familiar with is the dot product (also called the inner product), $\vec{{}V}\cdot \vec{{}W}$, which takes two vectors and returns a number. It essentially tells us how much the vectors overlap with each other, at least when one of them is a unit vector. The notation that we use for the analogous operation for two quantum states, $|\psi\rangle$ and $|\phi\rangle$ say, is $\langle\phi|\psi\rangle$. We call the state $|\psi\rangle$ a “ket,” and the flipped object $\langle\phi|$ a “bra,” so that when you put them together as $\langle\phi|\psi\rangle$ you get a bra-ket, or bracket. And yes, that’s a 100 year old physics pun from Paul Dirac, who introduced the notation.
Say we want to find out where the particle is. In general, even if we’re told the state $|\psi\rangle$ of the particle, we can’t say for sure where it is until we make a measurement. In fact, the weirdness of quantum mechanics is that the particle typically didn’t even have a well-defined position before you measured it. All the state $|\psi\rangle$ can tell us is the probability of finding the particle at position $x$, say.
What we can do is define another state vector $|x\rangle$ which describes a particle that is precisely at position $x$. Then the probability of finding our particle with its state $|\psi\rangle$ at that location is given by taking the bra-ket overlap between the two $\langle x|\psi\rangle$, and then squaring it:
$$ P(x) = |\langle x|\psi \rangle|^2. $$
This is the probability of finding the particle at position $x$. The overlap $\langle x|\psi\rangle$ is called the wavefunction $\psi(x)$ of the state. So we can alternatively express the probability as $P(x) = |\psi(x)|^2$. Wherever this function is biggest, the more likely you’ll find the particle there when you make your measurement.
For example, the particle might be in a state where it will be found at point A with probability 1/3 or at point B with probability 2/3. We don’t know which value we’ll get until we make the measurement. And before we do measure, the particle isn’t really localized at one or the other. If we set up a bunch of identical copies of the system side-by-side, each in this particular state, and then measure the position of the particle in each, a third of the time you’ll find it at A, and two-thirds of the time you’ll find it at B.
That’s profoundly bizarre, however my aim for right now isn’t to dive into the rabbit hole of what it means to make a measurement in quantum mechanics, but just to tell you this basic fact: when you make a measurement corresponding to an operator like $\hat{{}x}$, all you can report beforehand if you know the state $|\psi\rangle$ are the probabilities of getting various values of the position. Therefore, you can’t in general say where the particle will be, but only the average value of where it might be. This average is called the expectation value of the operator, $\hat{{}x}$ in this case, and it’s given by sandwiching the operator between the bra and ket for the given state, $\langle \psi | \hat{{}x} |\psi\rangle$. This just means to act $\hat{{}x}$ on $|\psi\rangle$, which gives you another vector $\hat{{}x}|\psi\rangle$, and then to take the inner product of that state with $\langle\psi|$.
Okay, those were the essential elements of quantum mechanics that we need to accomplish the current aim, which again is to explain what it means that momentum is the generator of spatial translation symmetry in quantum mechanics, and then to show how that implies the canonical commutation relation. That’s what we’ll do now.
First of all, what does it mean to have a symmetry in quantum mechanics? Like any other transformation, a symmetry will be represented by an operator, $\hat{{}U}$ say, that acts on the space of quantum states. And if this transformation is to be a symmetry, it had better not change any of our probabilities.
So, let an operator $\hat{{}U}$ act on our state $|\psi\rangle$, and turn it into a new state $\hat{{}U}|\psi\rangle$. Likewise, it turns $|x\rangle$ into $\hat{{}U} |x\rangle$ and, when we flip that around to make the corresponding bra, it becomes $\langle x| \hat{{}U}{}^{\dagger}$, where $\hat{{}U}{}^\dagger$(pronounced “U dagger”) is called the adjoint of $\hat{{}U}.$ Again that’s something you might have encountered before in studying matrices, where to find the adjoint you simply take the transpose of the matrix and then complex conjugate it.
If this transformation is going to be a symmetry, it has to leave our probability function $P(x) = |\langle x|\psi\rangle|^2$ unchanged. But it sends
$$ \langle x|\psi\rangle \to \langle x|\hat{{}U}{}^\dagger \hat{{}U} |\psi\rangle. $$
If this is to be invariant, then the operator should satisfy $\hat{{}U}{}^\dagger \hat{{}U} = 1$. In other words, the adjoint of $\hat{{}U}$ should be the same as its inverse, $\hat{{}U}{}^\dagger = \hat{{}U}{}^{-1}$, so that when you apply them in sequence you undo the transformation and get 1. Operators that satisfy this special property are called unitary.
We therefore learn that if our symmetry transformation is implemented by a unitary operator $\hat{{}U}$, it will preserve the probability function $P(x)$ for the position $x$, as well as the probability functions for any other variables, just like we wanted. Most symmetry transformations in quantum mechanics are therefore represented by unitary operators.
Classically, we learned in the last lesson that momentum is the generator of spatial translations, meaning that $p$ defines a transformation that shifts the position $x$ of the particle over:
$$ x(\lambda) = x_0 + \lambda. $$
In quantum mechanics, we’re therefore looking for a symmetry operator $\hat{{}U}(\lambda)$ whose effect is to shift the position operator $\hat{{}x}$ by $\lambda$,
$$ \hat{{}U}(\lambda): \hat{{}x}\to \hat{{}x}+\lambda. $$
To understand how this works, consider how a general transformation changes the expectation value of $\hat{{}x}$:
$$ \langle \psi|\hat{{}x}|\psi\rangle \to \langle \psi|\hat{{}U}{}^{-1}\hat{{}x} \hat{{}U} |\psi\rangle. $$
Notice, that as far as the expectation value is concerned, we can implement the transformation just as well by instead replacing the operators by
$$ \hat{{}x} \to \hat{{}U}{}^{-1} \hat{{}x} \hat{{}U}. $$
Therefore, the translation symmetry we’re looking for is defined by
$$ \hat{{}U}{}^{-1}(\lambda)\hat{{}x} \hat{{}U}(\lambda) = \hat{{}x} + \lambda. $$
Now how does the momentum operator factor into all this? $\hat{{}U}(\lambda)$ defines a translation by $\lambda$ for any value of this parameter. In particular, we can take $\lambda$ to be really tiny, so that we’re talking about an infinitesimal shift. When $\lambda = 0$, we haven’t done anything at all, of course, and so $\hat{{}U}(\lambda = 0) = 1.$ Then when we turn on a small value for $\lambda$, we’ll get some small transformation that’s only slightly shifted away from the identity operator, and we can write it as
$$ \hat{{}U}(\lambda) = 1 -\frac{i}{\hbar} \lambda \hat{{}G} + \cdots, $$
for some other operator $\hat{{}G}$. The $\cdots$ stands for higher powers of $\lambda$ that only become important when you make it a bigger number. $\hat{{}G}$ is called the quantum generator of the symmetry transformation $\hat{{}U}$, and the factor of $-i/\hbar$ that we pulled out front is a matter of convention. The $\hbar$ ensures $\hat{{}G}$ has the units that we want, and the $i$ ensures that $\hat{{}G}$ itself is real, in an appropriate sense.
Based on our classical experience with momentum being the generator of translations, let’s now define the quantum momentum operator $\hat{{}p}$ to be the generator $\hat{{}G}$ of this transformation.
The inverse of this expression just flips the sign, $\hat{{}U}{}^{-1}(\lambda) = 1 + \frac{i}{\hbar}\lambda \hat{{}p}+\cdots$, and so our definition of a spatial translation for small $\lambda$ becomes
$$ \left(1 + \frac{i}{\hbar} \lambda \hat{{}p} \right) \hat{{} x}\left(1 - \frac{i}{\hbar} \lambda \hat{{}p} \right) = \hat{{}x}+\lambda. $$
If we multiply out the left-hand-side, we get
$$ \hat{{}x} + \frac{i}{\hbar}\lambda \hat{{}p} \hat{{}x} -\frac{i}{\hbar} \lambda \hat{{}x} \hat{{}p} , $$
where I’ve dropped the $\lambda^2$ term because we’re taking $\lambda$ to be infinitesimally small. Then we obtain
$$ \hat{{}x} -\frac{i}{\hbar} \lambda(\hat{{}x} \hat{{}p} - \hat{{}p}\hat{{}x}) = \hat{{}x}+\lambda. $$
Like we defined before, the difference $\hat{{}x} \hat{{}p} - \hat{{}p}\hat{{}x} = [\hat{{}x},\hat{{}p}]$ is called the commutator of the operators $\hat{{}x}$ and $\hat{{}p}$. Then simplifying this equation, we find
$$ [\hat{{}x},\hat{{}p}] = i \hbar. $$
At last then, by defining the momentum operator as the generator of translations and looking at an infinitesimal symmetry, we have discovered the canonical commutation relation!
All this machinery is very general. For example, time translation symmetry, which we saw is classically generated by the Hamiltonian, is quantum mechanically described by the unitary transformation
$$ \hat{{}U}(t) = 1 - \frac{i}{\hbar} t \hat{{}H} + \cdots, $$
where the infinitesimal generator is now defined as $\hat{{}H}$, the Hamiltonian operator. Under this transformation, an operator like $\hat{{}p}$ transforms as
$$ \hat{{}p} \to\hat{{}U}{}^{-1}(t) \hat{{}p} \hat{{}U}(t), $$
which, when $t$ is small, becomes
$$ \left( 1 +\frac{i}{\hbar} t \hat{{}H} \right) \hat{{}p} \left( 1 -\frac{i}{\hbar} t \hat{{}H} \right) = \hat{{}p} -\frac{i}{\hbar} t [\hat{{}p},\hat{{}H}]. $$
Thus, if the quantum momentum is to be constant in time, it must commute with the Hamiltonian operator:
$$ \frac{\mathrm{d} \hat{{}p} }{\mathrm{d} t } =-\frac{i}{\hbar}[\hat{{}p},\hat{{}H}] = 0. $$
Meanwhile, we can ask how the Hamiltonian operator transforms under our spatial translation from earlier:
$$ \hat{{}H} \to \hat{{}U}{}^{-1}(\lambda) \hat{{}H} \hat{{}U}(\lambda) = \hat{{}H} - \frac{i}{\hbar} \lambda [\hat{{}H},\hat{{}p}] + \cdots, $$
and so the Hamiltonian is in turn invariant under spatial translations if it commutes with $\hat{{}p}$. (I'm using the same symbol $\hat{{}U}$ here to denote the spatial translation and time translation operator, and just letting the argument $\lambda$ or $t$ indicate which one I mean.)
Thus, the momentum will be constant in time if and only if spatial translations are a symmetry of the Hamiltonian. This is an example of the quantum version of the Hamiltonian Noether theorem.
If you’re new to quantum mechanics then all this information and notation was probably a little overwhelming, but it will make more and more sense the more you learn! I’ll be posting more lessons diving deeper into what some of the ingredients we talked about here mean, so stay tuned.
See also:
If you encounter any errors on this page, please let me know at feedback@PhysicsWithElliot.com.