Symmetries & Conservation Laws in Field Theory
The relationship between symmetries of nature and conservation laws in physics is one of the most profound connections that human beings have understood about the universe since we started doing science. Symmetries are so fundamental that the standard model of particle physics, which is the most predictive theory that scientists have ever written down, is typically denoted simply by its symmetry group, called “$\mathrm{SU(3)\times SU(2) \times U(1)}$”. In this lesson, we’re going to explore the most basic symmetry at the heart of the standard model: the symmetry that underlies electromagnetism and the conservation of electric charge.
We’ve seen in past lessons how symmetries are tied together with conservation laws by Noether’s theorem. Translation symmetry in space, for example, is tied to momentum conservation, meaning that if you can pick your system up and slide it over without changing anything about the physics, then the total momentum in that direction is a constant. Rotational symmetry likewise leads to angular momentum conservation. And energy conservation follows from translation symmetry in time, meaning if the dynamics of your system looked the same yesterday as they will tomorrow.
What Noether’s theorem says is that if a system of particles described by a Lagrangian $L$ has a symmetry, then you’re guaranteed to find a corresponding conserved quantity $Q$, meaning that if you evaluate $Q$ at any time $t$, you’ll always get back the same number:
$$ \frac{\mathrm{d} Q}{\mathrm{d} t } = 0. $$
In the last mini-lesson, though, we started discussing field theory, where we’re not only interested in how the coordinates $x(t)$ of a bunch of particles move around, but in fluctuating fields $\phi(t,x)$ that permeate space and time, interacting with particles and, potentially, with each other. The most intuitive examples to keep in mind are the electric and magnetic fields that propagate the electromagnetic forces between charged particles, and that are bouncing off your eyeballs at this very moment.
In a field theory like electromagnetism, the connection between symmetries and conservation laws becomes even deeper. You’re no doubt familiar with the conservation of electric charge, for example. But the conservation of charge isn’t simply a statement that the total amount of electric charge in the universe is a constant. For example, if an electron disappeared from Tokyo at the same moment a muon appeared next to Tau Ceti, the total amount of electric charge would not have changed. But a conservation law in field theory is stronger than that: charge is conserved locally, meaning that the only way the amount of charge can change inside any box, large or small, is if a current continuously carries charges in or out.
But if conservation laws result from symmetries of nature, then what symmetry is responsible for the conservation of electric charge?
That’s what I want to tell you about in this lesson. We’ll see how local conservation laws arise in field theory, and how they’re captured by the continuity equation
$$ \frac{\partial \rho}{\partial t } + \frac{\partial J_x}{\partial x } + \frac{\partial J_y}{\partial y }+\frac{\partial J_z}{\partial z } = 0\iff \partial_\mu J^\mu = 0, $$
where $\rho$ is the charge density and $\boldsymbol J$ is the current density. And we’ll identify the associated symmetry, known as “$\mathrm{U(1)}$”, in a theory like electromagnetism that’s tied to the conservation of electric charge by Noether’s theorem. This is the simplest component of the standard model of particle physics, although I should clarify that the $\mathrm{U}(1)_\mathrm{EM}$ of electromagnetism is not literally the $\mathrm{U}(1)$ in the $\mathrm{SU}(3)\times \mathrm{SU}(2)\times \mathrm{U}(1)_\mathrm{Y}$ symmetry that labels the standard model. Instead, $\mathrm{U(1)_{EM}}\subset \mathrm{SU(2)\times U(1)_Y}$ is a component sitting inside the standard model, that falls out after the Higgs mechanism and its famous Higgs particle do their business. That part of the story will have to wait for another day, though.
Let’s start off by understanding what it means for electric charge to be locally conserved, which you may or may not have learned about before in a class on E&M, and after that we’ll see how conservation laws like these arise naturally from Noether’s theorem for any field theory. Say we have some volume of space $R$, and we count up the amount of charge $Q$ inside of it. $R$ might be the inside of a cubical box, for example, or it could be some complicated shape. To measure the amount of charge inside the box, we start from the charge density $\rho(t,\boldsymbol r)$, which represents the amount of charge per unit volume at any point $\boldsymbol r = (x,y,z)$ in space at any time $t$. In other words, if we look at an infinitesimally tiny box at a point $\boldsymbol r$, the amount of charge inside it is the charge per volume, $\rho$, times the volume of the little box, $\mathrm{d}x\,\mathrm{d}y \,\mathrm{d}z$, which I’ll write as $\mathrm{d}^3\boldsymbol r$ for short. To find the total charge inside our actual box $R$, we just dice it up into lots of little pieces like this, each with charge $\rho \,\mathrm{d}^3\boldsymbol r$, and then add them all up by integrating over the region:
$$ Q(t) = \int\limits_R \mathrm{d}^3\boldsymbol r~\rho(t,\boldsymbol r). $$
This is the total amount of charge in our box $R$ at time $t$.
What charge conservation means is that the only way $Q$ can change with time is if some of the charges inside the box move outside through the surface, or if additional charged particles from outside make their way in. Let’s write $B$ for the boundary surface of the box. These moving charges would constitute an electric current, and so what we need to figure out is, given some current, how much charge is flowing into or out of the region through the boundary $B$ at any moment?
Similar to $\rho$, which measured the charge density per volume of space, we measure the amount of current by the current density $\boldsymbol J$. But there are some important differences between $\rho$ and $\boldsymbol J$. Note first of all that $\boldsymbol J = (J_x,J_y,J_z)$ is a vector, because a current can flow in any direction in space. Also, whereas $\rho$ was the amount of charge per unit volume, which we used to find the total amount of charge inside the volume of the box, with $\boldsymbol J$ we want to find the amount of current flowing through the boundary surface $B$. We therefore define $\boldsymbol J(t,\boldsymbol r)$ to be the current per unit area, rather than per unit volume.
Let’s look at the top surface of our box, for example, and see how much current is flowing out through it. Take a little patch of the surface at a point $\boldsymbol r = (x,y,z)$, of width $\mathrm{d}x$ and length $\mathrm{d}y$. Since we want to know how much current is flowing out of the box, what we care about in this case is the $z$ component of $\boldsymbol J$ at that point—$J_x$ and $J_y$ just measure the current flowing parallel to the surface. Then the amount of current passing through that little patch is given by its area $\mathrm{d}x\, \mathrm{d}y$ times $J_z$, the current per unit area in the $z$ direction. We get the total current passing through the whole top surface of the box by integrating over it:
$$ I_\mathrm{top} = \int \mathrm{d}x\,\mathrm{d}y\, J_z(t,\boldsymbol r). $$
At any instant in time $t$, this is the amount of charge per second leaving through the top surface of the box.
Of course, now we need to do the same to find the current passing through the other sides of the box, and then add them all up to get the total current going through the whole surface. In general, our region $R$ needn’t be a neat box and $B$ needn’t be a cubical surface. It could be some misshapen blob instead. But the idea is the same. We slice up the surface into many little patches, each of area $\mathrm{d}a$. Then we multiply that by the current per area flowing out through the patch. That’s given by $\boldsymbol J(t,\boldsymbol r)$ at that point, but again we need to pick out the component that points perpendicular to the surface there in order to obtain the amount of current going out. Let’s write $\hat{{}\boldsymbol n}$ for the unit vector that’s perpendicular to the surface at that point. For example, on the top surface of the box $\hat{{}\boldsymbol n} = \hat{{}\boldsymbol z}$ was a unit vector pointing up in the $z$ direction, on the right surface it would be $\hat{{}\boldsymbol n} = \hat{{}\boldsymbol y}$ pointing to the right in the $y$ direction, and so on. Then we can pick out the perpendicular component of $\boldsymbol J$ by taking the dot product, $\boldsymbol J \cdot \hat{{}\boldsymbol n}$.
All together then, the total current passing out of the boundary surface $B$ is given by integrating over the area of the surface:
$$ I = \int\limits_B \mathrm{d}a ~\boldsymbol{J} \cdot \hat{{}\boldsymbol n}. $$
Which brings us back to charge conservation. $I$ measures the amount of charge per unit time leaving the box (or entering it, if $I$ comes out negative). Local conservation of charge is the statement that if charge $I$ per unit time flows out through the boundary, then the amount of charge $Q$ inside the volume of the box goes down at that same rate:
$$ \frac{\mathrm{d} Q}{\mathrm{d} t } = -I. $$
This is the mathematical statement of charge conservation. Again, the minus is there because we defined positive $I$ to mean that current is flowing out through the boundary, in which case the amount of charge inside the box is decreasing at that same rate.
Spelled out, the charge conservation equation says that
$$ \frac{\mathrm{d} }{\mathrm{d} t } \int\limits_R \mathrm{d}^3 \boldsymbol r~\rho = - \int \limits_B \mathrm{d}a~\boldsymbol J \cdot \hat{{}\boldsymbol n}. $$
If, in particular, we took $R$ to encompass all of space, so that the boundary $B$ is going to infinity, the current density $\boldsymbol J$ had better go to zero there in any physically reasonable setup since there’s nowhere left for the current to flow out to. Then the RHS vanishes, and this equation says that the total charge in all of space is a constant. That’s the statement of global charge conservation. But again, local conservation of charge is a stronger statement: that this equation must hold for any volume $R$ we like. That’s why a charge can’t disappear from Tokyo and reappear at Tau Ceti. Instead of choosing $R$ to fill all of space, just build a box around Tokyo. Then the total charge inside can only change if charged particles are continuously carried in or out through the surface along a current.
On the flip side, we can alternatively take our box $R$ to be an infinitesimally small cube, whose dimensions $\Delta x$, $\Delta y$, $\Delta z$ are going to zero. That lets us turn this integral equation into a differential equation. Let’s first of all bring the $\frac{\mathrm{d} }{\mathrm{d} t }$ inside the integral on the left:
$$ \int\limits_R \mathrm{d}^3 \boldsymbol r~\frac{\partial \rho }{\partial t } = - \int \limits_B \mathrm{d}a~\boldsymbol J \cdot \hat{{}\boldsymbol n}. $$
The only change is that it turns into a partial derivative $\frac{\partial }{\partial t }$, because $\rho(t,\boldsymbol r)$ is a function of both time and space.
Now when we shrink our volume down to a teeny, tiny cube surrounding a point $(x,y,z)$, these integrals become pretty boring. On the left, we just get $\frac{\partial \rho}{\partial t }$ times the volume of the box,
$$ \int\limits_R \mathrm{d}^3\boldsymbol r ~\frac{\partial \rho}{\partial t } = \Delta x \Delta y \Delta z~ \frac{\partial \rho}{\partial t }, $$
the reason being that $\frac{\partial \rho}{\partial t }$ is essentially constant over this infinitesimally small region.
The RHS is slightly more interesting. Take the top surface again, for example. The outward pointing perpendicular direction is going up, so we get $\boldsymbol J \cdot \hat{{}\boldsymbol n} = J_z$ evaluated at the top of the box, and the area is $\Delta x \Delta y$. So the top surface contributes
$$ \Delta x \Delta y~J_z|_\mathrm{top} $$
to the integral. For the bottom surface, on the other hand, the outward direction is pointing down, so for that piece of the integral we get $\boldsymbol J \cdot \hat{{}\boldsymbol n} = -J_z$, and the bottom surface contributes
$$ -\Delta x \Delta y~J_z|_\mathrm{bottom}. $$
Together, we get
$$ \Delta x \Delta y~(J_z|_\mathrm{top} - J_z|_\mathrm{bottom}). $$
In other words, it’s $\Delta x \Delta y$ times $\Delta J_z$—the change in $J_z$ between the top and bottom of the box.
We had a factor of $\Delta x\Delta y\Delta z$ on the LHS of our equation that we’re going to want to cancel out, so let me go ahead and multiply by $\frac{\Delta z}{\Delta z}$ on the right. Then the contribution to the surface integral from the top and bottom of the box is
$$ \Delta x \Delta y \Delta z ~ \frac{\Delta J_z}{\Delta z}. $$
It’s the volume of our little bitty cube times $\frac{\partial J_z}{\partial z }$, the derivative of $J_z$ in the $z$ direction!
Of course, we also have to include the right and left, and “forward” and “back” surfaces of the box as well. Those give us the derivatives of $J_x$ in the $x$ direction and $J_y$ in the $y$ direction. All together, our charge conservation equation when we shrink the region $R$ down to be infinitesimally small becomes
$$ \Delta x \Delta y \Delta z \frac{\partial \rho}{\partial t } = -\Delta x \Delta y \Delta z \left(\frac{\partial J_x}{\partial x } +\frac{\partial J_y}{\partial y } + \frac{\partial J_z}{\partial z }\right). $$
Cancelling out the volumes, we’re left with a differential relation
$$ \frac{\partial \rho}{\partial t } + \frac{\partial J_x}{\partial x } + \frac{\partial J_y}{\partial y }+ \frac{\partial J_z}{\partial z } = 0. $$
This is the most direct, local statement of electric charge conservation. It’s called the continuity equation, and it’s the prototype for what it means to have a conservation law in any field theory. We usually shorten it by defining a “vector” with the $x$, $y$, and $z$ derivatives
$$ \nabla = \left( \frac{\partial }{\partial x }, \frac{\partial }{\partial y }, \frac{\partial }{\partial z }\right), $$
called “del.” Then the sum of the derivatives of $J_x,$ $J_y$, and $J_z$ are just the dot product of $\nabla$ and $\boldsymbol J$, and we can express the continuity equation as
$$ \frac{\partial \rho}{\partial t } + \nabla \cdot \boldsymbol J = 0. $$
By the way, what we basically did in our argument here was discover the divergence theorem (because $\nabla \cdot \boldsymbol J$ is called the divergence of $\boldsymbol J$), which you’ll learn about in your math classes:
$$ \int\limits_R \mathrm{d}^3\boldsymbol r~\nabla \cdot \boldsymbol J = \int\limits_B \mathrm{d}a~\boldsymbol J \cdot \hat{{}\boldsymbol n}. $$
It lets us turn the integral of a function like $\boldsymbol J$ over a surface into the integral of the derivatives of $\boldsymbol J$ over the volume inside. Since $R$ was arbitrary for us, we can shrink it down and conclude that the integrands $\frac{\partial \rho}{\partial t }$ and $-\nabla \cdot \boldsymbol J$ on the two sides have to be equal point by point.
Okay, now we’ve understood what it means for electric charge to be conserved in electromagnetism. Next we want to understand how all this extends to a more general field theory defined by its action
$$ S = \int\limits_{t_i}^{t_f} \mathrm{d}t\int\limits_{\mathrm{space}}\mathrm{d}^3 \boldsymbol r~\mathcal{L}, $$
given by integrating the Lagrangian density $\mathcal L$ over space and time. And most of all, we want to understand how these conservation laws are related to symmetries by Noether’s theorem. In particular, I hope you’re really curious at this point to discover what symmetry is responsible for the conservation of electric charge!
In the last mini-lesson, we started learning about field theory by studying the simplest example, called the Klein-Gordon theory. It consists of a single free field $\phi(t,\boldsymbol r)$ that assigns a number to each point $\boldsymbol r$ in space at each time $t$, and it’s a great example for learning the fundamentals of field theory. It’s defined by the Lagrangian density
$$ \mathcal{L} = \frac{1}{2c^2} \left( \frac{\partial \phi}{\partial t }\right)^2 - \frac{1}{2} \left( \frac{\partial \phi }{\partial x }\right)^2 - \frac{1}{2} \kappa^2 \phi^2, $$
plus the $y$ and $z$ derivative terms, which I haven’t written out. $c$ here is the speed of light, and $\kappa$ is a parameter that we saw is related to the mass of the particles that you get when you turn this into a quantum theory. Actually, we usually just wind up calling $\mathcal{L}$ the Lagrangian instead of the Lagrangian density, though strictly speaking the Lagrangian is what you get after you integrate $\mathcal{L}$ over space.
By applying the principle of least action to this theory, we found that the equation of motion for $\phi$ is the Klein-Gordon equation:
$$ -\frac{1}{c^2} \frac{\partial^2\phi }{\partial t^2 } + \frac{\partial^2\phi }{\partial x^2 } = \kappa^2 \phi, $$
which is a generalization of the wave equation, and we talked about how we can write the general solution of this equation as a sum of plane waves, $e^{i(kx - \omega t)}$.
I also showed you last time how to write all this much more compactly using relativistic notation. It makes all the formulas involved in this subject much more neat and concise, but on the other hand, if it’s new to you, then it might backfire and make the equations seem mysterious. So first I’ll work things through with all the $t$’s and $x$’s spelled out, and then afterwards I’ll show you how much simpler things look with a better notation.
When we talked about Noether’s theorem for regular old particle mechanics, what we discovered was that whenever we had a symmetry of the Lagrangian—meaning an infinitesimal transformation that left it invariant—there would be a corresponding conserved quantity that’s constant in time as the particle moves around.
For example, we looked at one problem with a block sitting on a frictionless table attached to a spring that’s pinned down at the other end. This setup does not have translation symmetry: if you pick up the block and slide it over to the right, say, the spring gets stretched and so you’ve changed the system! Then the $x$ and $y$ momenta of the block aren’t conserved, as expected since the spring will pull on the block and accelerate it if you move it away from equilibrium.
On the other hand, the system does have rotational symmetry, because you can pick up the block and rotate it around without changing the length of the spring. The potential energy stored in the spring, $U = \frac{1}{2} k(r-l)^2$, only cares about how far away from the origin the block is, measured by $r = \sqrt{x^2+y^2}$, not the angle $\theta$ that it makes in the $xy$ plane. We showed that this rotational symmetry implies by Noether’s theorem that the angular momentum of the block is conserved.
We’re going to discover a similar, and even stronger relationship in field theory between symmetries and conservation laws. In fact, the symmetry that’s tied to electric charge conservation is closely analogous to the symmetry of the block on a spring. It’s a rotational symmetry in field space that leads to electric charge conservation.
To see how this works, it’ll actually be more interesting, and more closely analogous to the field theory that describes the electron and electromagnetic force, to study a slight generalization of the Klein-Gordon theory. Instead of a real field $\phi(t,\boldsymbol r)$ that assigns a real number to each point in space at each time, let’s consider a complex field that assigns a complex number to each point, meaning a number $\phi = a + i b$ with a real part and an imaginary part. We define the Lagrangian for the complex field by
$$ \mathcal{L} = \frac{1}{c^2} \frac{\partial \bar\phi}{\partial t }\frac{\partial \phi}{\partial t } - \frac{\partial \bar \phi}{\partial x } \frac{\partial \phi}{\partial x } - \kappa^2 \bar \phi \phi, $$
where it’s conventional to leave out the factors of 1/2 for the complex version. $\bar \phi$ stands for the complex conjugate of $\phi$: $\bar \phi = a - i b$. The Lagrangian is real, though, because
$$ \bar\phi \phi = (a - ib)(a + ib) = a^2+b^2, $$
which is real, and likewise for the other terms in the Lagrangian—all the imaginary cross terms cancel out. If you split up $\phi$ into a real and imaginary part like this, then you can see that this theory is just two copies of our old real Klein-Gordon theory—one for the real part and one for the imaginary part. In the quantum version, we’ll therefore get two kinds of particles, corresponding to a particle and anti-particle pair.
The equation of motion for $\phi$ is still the Klein-Gordon equation,
$$ -\frac{1}{c^2} \frac{\partial^2\phi }{\partial t^2 } + \frac{\partial^2\phi }{\partial x^2 } = \kappa^2 \phi, $$
and likewise we can write the same thing for $\bar\phi$ just by putting a bar on top,
$$ -\frac{1}{c^2} \frac{\partial^2\bar\phi }{\partial t^2 } + \frac{\partial^2\bar\phi }{\partial x^2 } = \kappa^2 \bar\phi. $$
Now, what symmetries does this theory have? Like I mentioned, the one I want to focus on is closely analogous to our block-on-a-spring example from a minute ago. The real and imaginary parts of $\phi = a + ib$ give us a point $(a,b)$ in a 2d plane. In other words, we can think of the complex number $\phi$ like an arrow that goes over to the right by $a$ in the “real direction”, and up in the “imaginary direction” by $b$.
The length of the arrow is $|\phi|=\sqrt{a^2+b^2}$, and it makes an angle $\theta$ with the horizontal axis, so that the lengths of the two sides of the triangle are $a = |\phi| \cos \theta$ and $b = |\phi| \sin \theta.$ But just like the block-on-a-spring, our complex Klein-Gordon Lagrangian only depends on the length $|\phi|$ of the arrow, not on the angle $\theta$ that it makes in this plane.
The reason why is that $\phi$ and $\bar\phi$ always show up paired together in each term of the Lagrangian. The last term, for example, is just $\bar\phi \phi = a^2 + b^2$, i.e. the length-squared of the arrow, $\bar\phi \phi = |\phi|^2$. The same goes for the terms with the derivatives, because again $\phi$ and $\bar\phi$ always appear together.
We therefore learn that this Lagrangian has rotational symmetry, in the sense of the complex $\phi$ plane! This is the same kind of symmetry that leads to electric charge conservation in electromagnetism.
Another way to say the same thing is to use the fact that a complex number $\phi = a + i b$ can equivalently be written as $\phi =|\phi| e^{i \theta}$, where again $|\phi| = \sqrt{a^2+b^2}$ is the magnitude and $\theta$ is the angle of the arrow. That’s thanks to Euler’s identity $e^{i\theta} = \cos \theta + i \sin\theta,$ so that
$$ |\phi| e^{i \theta} = |\phi| \cos \theta +i |\phi| \sin \theta. $$
$|\phi|\cos \theta$ and $|\phi| \sin \theta$ are just the horizontal and vertical components $a$ and $b$ of our arrow, and so this is the same as writing $a+ i b$.
The reason writing things this way is convenient is that it makes it very simple to rotate $\phi = |\phi| e^{i \theta}$ to a new angle: just multiply it by $e^{i \alpha}$ for whatever angle $\alpha$ you want to rotate by,
$$ \phi \to e^{i \alpha}\phi = |\phi|e^{i \alpha}e^{i \theta} = |\phi| e^{i(\theta + \alpha)}, $$
thanks to the fact that $e^Ae^B = e^{A+B}$. So after we multiply by $e^{i \alpha}$, $\phi$ gets rotated around to a new angle $\theta + \alpha$, but the magnitude $|\phi|$ doesn’t change.
The rotational symmetry of our Lagrangian is therefore simply the transformation
$$ \phi \to e^{i \alpha} \phi,\quad \bar \phi \to e^{-i\alpha}\bar\phi. $$
And in this notation it’s even easier to see why the Lagrangian is invariant: since each term has a $\phi$ and a $\bar\phi$, when we make the transformation one picks up a factor of $e^{i\alpha }$ and the other $e^{-i\alpha}$, and when they’re multiplied together they cancel each other out!
This kind of symmetry is called $\mathrm{U}(1)$, where the $\mathrm{U}$ stands for unitary. The terminology comes from the definition of a unitary matrix, which is a matrix $M$ that satisfies the property $\overline M^\mathrm{T} M = 1,$ meaning that if you take the complex conjugate of $M$ and then its transpose, you should get the inverse matrix of $M$. The space of $n\times n$ matrices satisfying this property is called $\mathrm{U}(n)$. For $n = 1$, though, the “matrix” is just a single number, and it has to be of the form $e^{i\alpha}$, just like our rotation factor. That’s because its complex conjugate is $e^{-i\alpha}$, while the transpose doesn’t do anything at all, and indeed $e^{-i\alpha} e^{i\alpha} = 1.$
By the way, the other symbols $\mathrm{SU}(3)$ and $\mathrm{SU}(2)$ in the standard model symmetry group $\mathrm{SU(3)\times SU(2)\times U(1)}$ stand for similar, larger symmetries with $n = 2$ and $n = 3$. The "$\mathrm{S}$" means that in addition to being unitary, these rotation matrices are required to have determinant equal to 1.
Now that we have the symmetry we’re interested in, let’s see how it leads to a conservation law by Noether’s theorem. Remember the basic way that Noether’s theorem worked when we studied it before in particle mechanics. The point was that under an arbitrary transformation with infinitesimal parameter $\varepsilon$, the change in the Lagrangian always takes the form
$$ \mathrm{d}L = (\mathrm{EOM})\varepsilon + \frac{\mathrm{d} }{\mathrm{d} t }Q, $$
where $\mathrm{EOM}$ is the thing that vanishes when the equation of motion is satisfied. If we choose a specific $\varepsilon$ that gives us a symmetry of the Lagrangian, then $\mathrm{d}L = 0$ on the LHS. Then on the physical trajectory, where $\mathrm{EOM} = 0$, this equation tells us that whatever quantity $Q$ appears under the $\frac{\mathrm{d} }{\mathrm{d} t }$ is conserved, $\frac{\mathrm{d}Q }{\mathrm{d} t } = 0$.
We’ll discover a very similar relationship in field theory. The main difference is that $\phi(t,\boldsymbol r)$ now depends on both time and space, and so instead of just getting a $t$ derivative on the RHS, we’ll have a sum of time and space derivatives:
$$ \mathrm{d}\mathcal{L} = (\mathrm{EOM})\varepsilon +\frac{\partial \rho}{\partial t } + \frac{\partial J_x}{\partial x } + \frac{\partial J_y}{\partial y } + \frac{\partial J_z}{\partial z }, $$
for some quantities $\rho$ and $\boldsymbol J = (J_x,J_y,J_z)$ that depend on the transformation we’re making. If we choose a symmetry transformation for which $\mathrm{d}\mathcal{L} = 0$, then when the field satisfies the equation of motion we discover a continuity equation:
$$ \frac{\partial \rho}{\partial t } + \nabla \cdot \boldsymbol J = 0. $$
A local conservation law! This is the way Noether’s theorem guarantees that symmetries lead to conservation laws in field theories.
Let’s work it out for our rotational symmetry of the complex Klein-Gordon Lagrangian,
$$ \mathcal{L} = \frac{1}{c^2} \frac{\partial \bar\phi}{\partial t }\frac{\partial \phi}{\partial t } - \frac{\partial \bar \phi}{\partial x } \frac{\partial \phi}{\partial x } - \kappa^2 \bar \phi \phi. $$
Start with an arbitrary transformation of the field,
$$ \phi \to \phi + \varepsilon, $$
where $\varepsilon$ is an infinitesimal shift. $\varepsilon$ can be anything here, including a function of time and space and even $\phi$ itself.
For any random choice of $\varepsilon$, the Lagrangian certainly isn’t going to be invariant. Let’s see how it changes in general when we make this shift. When we make the substitution $\phi \to \phi + \varepsilon$ in $\mathcal{L}$, leaving $\bar \phi$ alone for the moment,
$$ \mathcal{L} \to \frac{1}{c^2} \frac{\partial \bar\phi}{\partial t }\frac{\partial}{\partial t } (\phi + \varepsilon) - \frac{\partial \bar \phi}{\partial x } \frac{\partial }{\partial x }(\phi + \varepsilon) - \kappa^2 \bar \phi (\phi+\varepsilon), $$
we just get back the original Lagrangian we started with, plus three new terms with $\phi$ replaced by $\varepsilon$. (Again, I’m not bothering to write the $y$ and $z$ terms here since things are already getting complicated enough.)
Then the change in the Lagrangian under this transformation is
$$ \mathrm{d}\mathcal{L} = \frac{1}{c^2} \frac{\partial \bar\phi}{\partial t }\frac{\partial \varepsilon}{\partial t } - \frac{\partial \bar \phi}{\partial x } \frac{\partial \varepsilon}{\partial x } - \kappa^2 \bar \phi \varepsilon. $$
Next up, just like when we learned to apply the principle of least action, we want to integrate by parts on the first two terms to rewrite them like so:
$$ \frac{1}{c^2} \frac{\partial \bar\phi}{\partial t }\frac{\partial \varepsilon}{\partial t } = -\frac{1}{c^2} \frac{\partial^2\bar\phi }{\partial t^2 }\varepsilon + \frac{\partial }{\partial t }\left(\frac{1}{c^2} \frac{\partial \bar\phi}{\partial t } \varepsilon\right) $$
and
$$- \frac{\partial \bar\phi}{\partial x }\frac{\partial \varepsilon}{\partial x } = \frac{\partial^2\bar\phi }{\partial x^2 }\varepsilon - \frac{\partial }{\partial x }\left(\frac{\partial \bar\phi}{\partial x } \varepsilon\right). $$
Then we find that the leading change in the Lagrangian when we make a tiny variation $\phi \to \phi + \varepsilon$ is
$$ \mathrm{d} \mathcal{L} = \left( -\frac{1}{c^2} \frac{\partial^2 \bar\phi }{\partial t^2 } + \frac{\partial^2\bar\phi }{\partial x^2 }- \kappa^2 \bar\phi \right)\varepsilon + \frac{\partial }{\partial t } \left(\frac{1}{c^2} \frac{\partial \bar\phi}{\partial t } \varepsilon\right)- \frac{\partial }{\partial x } \left(\frac{\partial \bar\phi}{\partial x } \varepsilon\right). $$
As promised, the first big quantity in parentheses is just the thing that vanishes when $\bar\phi$ satisfies its Klein-Gordon equation. Indeed, this was almost exactly the procedure we followed to derive the Klein-Gordon equation when we applied the principle of least action last time. The only difference was that in that case we required $\varepsilon$ to vanish at the boundaries of the integral: namely at the initial and final times $t_i$ and $t_f$, and at spatial infinity. That’s because in that context we’re trying to find the field configuration that minimizes the action within the set of fields with fixed boundary conditions. Then when we integrate $\mathrm{d}\mathcal{L}$ to find the change in the action, the second pair of terms with the derivatives drop out—just like we saw with the divergence theorem, the integral basically kills the derivatives and leaves the things in parentheses evaluated at the boundaries, which vanish because $\varepsilon = 0$ there.
But our identity here for $\mathrm{d}\mathcal{L}$ holds for any transformation $\phi \to \phi+ \varepsilon.$ Of course $\bar\phi$ will also transform as well, in general, by $\bar\phi \to \bar\phi + \bar \varepsilon$ where $\bar\varepsilon$ is the complex conjugate of $\varepsilon.$ That works out in a totally analogous way, and when we add it all up to find the total change in the Lagrangian, we get an equation of the form
$$ \mathrm{d}\mathcal{L} = \mathrm{EOM's} + \frac{\partial \rho}{\partial t } + \frac{\partial J_x}{\partial x } + \frac{\partial J_y}{\partial y } + \frac{\partial J_z}{\partial z }, $$
where I’ve defined
$$ \rho = \frac{1}{c^2} \left(\varepsilon \frac{\partial \bar\phi}{\partial t } + \bar\varepsilon \frac{\partial \phi}{\partial t }\right) $$
and
$$ J_x = -\left( \varepsilon \frac{\partial \bar\phi}{\partial x } + \bar\varepsilon \frac{\partial \phi}{\partial x }\right), $$
and similarly for $J_y$ and $J_z$. There are also cross terms that go like $\varepsilon \bar\varepsilon$ when we plug the transformed fields into $\mathcal{L}$, but remember that we’re only working with infinitesimal transformations here, and we don’t care about any terms with more than one power of $\varepsilon$ and/or $\bar\varepsilon$.
Noether’s theorem is now staring us in the face! This formula for $\mathrm{d}\mathcal{L}$ holds for any transformation $\varepsilon$, $\bar\varepsilon$, but if we now choose a specific symmetry transformation for which $\mathrm{d}\mathcal{L} = 0$, then when the Klein-Gordon equations are satisfied we learn that the corresponding $\rho$ and $\boldsymbol J$ define a conserved charge and current!
We’ve seen that $\mathcal{L}$ is invariant under the rotation
$$ \phi \to e^{i \alpha} \phi,\quad \bar \phi \to e^{-i\alpha}\bar\phi. $$
It’s a symmetry for any angle $\alpha$, but to apply Noether’s theorem we only care about the infinitesimal version. So let’s apply the Taylor series
$$ e^{i \alpha} = 1 + i \alpha +\cdots, $$
keeping only the first interesting term. Then our infinitesimal rotation symmetry is
$$ \phi \to \phi + i \alpha \phi,\quad \bar \phi \to \bar\phi - i \alpha \bar\phi. $$
In other words, this is the transformation with $\varepsilon = i \alpha \phi$ and $\bar\varepsilon = - i \alpha \bar\phi.$ When we plug these into our formulas for $\rho$ and $\boldsymbol J$, we find that the rotation symmetry leads to a local conservation law with
$$ \rho = \frac{i}{c^2} \left( \phi\frac{\partial \bar\phi}{\partial t } - \bar\phi\frac{\partial \phi}{\partial t }\right) $$
and
$$ \boldsymbol J = -i \left(\phi \nabla \bar\phi - \bar\phi \nabla \phi\right), $$
where I’ve dropped the overall factor of $\alpha$, which was just an arbitrary constant. Don’t sweat the $i$’s by the way—the things in parentheses are pure imaginary since they’re each a complex number minus its complex conjugate. Then multiplying by the $i$ out front gives us back a real number.
Then Noether’s theorem implies that $\rho$ and $\boldsymbol J$ satisfy
$$ \frac{\partial \rho}{\partial t } + \nabla \cdot \boldsymbol J = 0, $$
and we have a conservation law.
Notice how similar the formulas for $\rho$ and $\boldsymbol J$ are here. It’s the same sort of expression, just with $t$ derivatives for $\rho$ and space derivatives for $\boldsymbol J$. Then there’s also an overall sign difference, and the couple of factors of $c$ in $\rho.$ These formulas are screaming out that they should be combined into a four-component spacetime vector.
Remember from last time that we introduced a notation for spacetime coordinates like this
$$ X^\mu = \begin{pmatrix} ct\\x\\y\\z \end{pmatrix}, $$
where $\mu = 0,1,2,3$ is an index. $\mu = 0$ is the “time component” $X^0 = ct$, and $\mu = 1,2,3$ are the space components. The factor of $c$ is there so that all the entries have the same dimensions of length.
Then we can define the derivatives with respect to $X^\mu$ by
$$ \frac{\partial }{\partial X^\mu} = \begin{pmatrix} \frac{1}{c}\frac{\partial }{\partial t }\\ \frac{\partial }{\partial x }\\ \frac{\partial }{\partial y }\\ \frac{\partial }{\partial z } \end{pmatrix}. $$
We usually write this even more simply as $\partial_\mu$.
Now let’s put the charge density $\rho$ and spatial current $\boldsymbol J$ together into the four-component spacetime current
$$ J^\mu = \begin{pmatrix} c\rho\\J_x\\J_y\\J_z \end{pmatrix}. $$
The $c$ is again there so that each component has the same units.
This makes a lot of sense! $\boldsymbol J$ is the amount of current flowing around through space, while $\rho$ is like a current flowing forward through time. For example, even if you just set a charged particle down at rest, so that $\boldsymbol J = 0$, the particle is still moving forward through time, and $c\rho$ is measuring the spacetime current in that direction.
In this notation, the current conservation law is simply
$$ \sum_{\mu = 0}^3 \partial_\mu J^\mu = 0, $$
because if we expand out the LHS we get
$$ \frac{\partial J^0}{\partial X^0 } + \sum_{i=1}^3 \frac{\partial J^i }{\partial X^i }= \frac{1}{c} \frac{\partial }{\partial t }(c \rho) + \nabla \cdot \boldsymbol J, $$
and the $c$’s cancel.
Thus, in our first notation we got the slightly awkward equation that the divergence of $\boldsymbol J$ in space, $\nabla \cdot \boldsymbol J$, is equal to minus the rate of change of the charge density, $-\frac{\partial \rho}{\partial t }$. But in relativistic notation the conservation law is much simpler: the divergence of $J^\mu$ in spacetime must vanish, $\sum \partial_\mu J^\mu = 0$.
The divergence $\nabla \cdot \boldsymbol J$ measures how much $\boldsymbol J$ spreads outward from a given point, like a water sprinkler spraying water out of a spigot. The conservation law says that $J^\mu$ can't have any divergence in spacetime, meaning you're not allowed to simply pop out new charges at any spacetime point. (Or, rather, no net charge. One could have a positive and negative charge pop out of the vacuum, for example, as long as the total charge is balanced.)
In fact, we usually don’t even bother to write sums like $\sum_{\mu=0}^3$. We just adopt the convention that any time an index like $\mu$ appears twice in any given term, as in $\partial_\mu J^\mu$, we sum over all its values. When $\partial_\mu J^\mu = 0$, we say that we have a conserved current.
Putting together the $\rho$ and $\boldsymbol J$ components we found for the rotation symmetry, the spacetime current is simply
$$ J^\mu =i\eta^{\mu\nu}\left(\bar\phi \partial_\nu \phi - \phi \partial_\nu \bar\phi \right), $$
where $\eta$ is the 4x4 diagonal matrix we defined last time:
$$\eta^{\mu\nu}= \begin{pmatrix} -1 & & &\\ & 1 & &\\ & & 1 &\\ & & & 1 \end{pmatrix}. $$
It’s what takes care of the relative signs for us between the time and space terms. For example, we get
$$ \begin{align} J^0 =& i \eta^{00} \left(\bar\phi \partial_0\phi - \phi \partial_0 \bar\phi \right)\notag\\ =& -i\left( \bar\phi \frac{1}{c} \frac{\partial }{\partial t } \phi - \phi \frac{1}{c} \frac{\partial }{\partial t } \bar\phi\right)\notag\\ =& c \rho\notag, \end{align} $$
with the same $\rho$ we found earlier.
Okay, these formulas are pretty, but you’re probably looking at them and wondering what the heck they mean. What’s the conserved quantity we’re talking about here? Remember that the conserved charge is defined by integrating the charge density $\rho$ over space,
$$ N = \int \mathrm{d}^3\boldsymbol r~ \rho, $$
which I’m calling $N$ now instead of $Q$ for the reason we’ll see in a second.
Also recall that last time we discussed that the general solution to the Klein-Gordon equation can be written as a sum of plane waves $e^{i(kx-\omega t)}$, which turn into the wave functions for particles created by the field in the quantum theory. By plugging this expansion into $N$, you can show that what it does is count the number of particles minus the number of anti-particles! This number is therefore a constant in time.
But what does all this have to do with conservation of electric charge, where we started at the beginning? Suppose these particles carry electric charge $q$, and therefore the anti-particles carry electric charge $-q.$ Then the total electric charge of a bunch of $N_+$ particles and $N_-$ anti-particles is
$$ q N_+ +(-q)N_- = q (N_+ - N_-) = q N. $$
Thus, the same conserved charge $N$ coming from our rotation symmetry counts the total electric charge $Q = q N$, once it’s multiplied by the unit of electric charge $q$. After multiplying by $q,$ the densities $\rho$ and $\boldsymbol J$ from the rotation symmetry become electric charge and electric current densities. (It's actually slightly more complicated than that in this particular example because the electromagnetic fields themselves modify the current, but let's not worry about that right now!)
Now that we’re talking about things with electric charge though, that means that in addition to our field $\phi$ there are electric and magnetic fields floating around as well! And our Lagrangian had better incorporate all of them. In the last part of the lesson, I want to show you the Lagrangian for the full theory, including the electromagnetic fields. Exploring all the details will really require a separate lesson of its own—or several—so for now I’ll just give you a quick summary.
In our relativistic notation, we can write the Klein-Gordon Lagrangian much more compactly as
$$ \mathcal{L} = - \eta^{\mu\nu} \partial_\mu \bar\phi \partial_\nu \phi - \kappa^2 \bar \phi\phi. $$
Remember that it’s implied that we’re summing over both $\mu$ and $\nu$ here. When we expand out the sum, we get the same Lagrangian as before, where $\eta^{\mu\nu}$ gets all the signs to work out right. I’m also going to work in units where $c = 1$ for the remainder, since that’s the standard convention in field theory and it makes the formulas look simpler.
As we’ve seen, $\mathcal{L}$ is invariant under rotations:
$$ \phi \to e^{-i q\alpha}\phi,\quad \bar \phi \to e^{i q \alpha}\bar\phi. $$
I’ve rescaled the parameter $\alpha$ here to $-q \alpha$, because $q$ is going to become the electric charge of $\phi$.
Electromagnetism, and the standard model as a whole, are examples of gauge theories, in which the defining symmetries are promoted to local transformations in spacetime. In other words, the gauge symmetry of electromagnetism is obtained by demanding that the Lagrangian is invariant under this rotation for any choice of $\alpha(X^\mu)$, including one that depends on what point you’re at in space and time.
Our original Lagrangian is not invariant under rotations when $\alpha$ is a function of $X^\mu$. The $\bar\phi \phi$ term is still okay,
$$ \bar\phi\phi \to e^{iq\alpha(X)} \bar\phi e^{-iq\alpha(X)}\phi = \bar\phi \phi, $$
because the factors of $e^{-iq\alpha(X)}$ and $e^{i q \alpha(X)}$ coming from $\phi$ and $\bar\phi$ cancel each other out. But the terms with the derivatives are no longer invariant.
When $\alpha$ was constant, we could pull the rotation right outside the derivative:
$$ \partial_\mu \phi \to \partial_\mu(e^{-iq\alpha}\phi) = e^{-iq\alpha} \partial_\mu\phi. $$
That way the rotations canceled between $\partial_\mu\phi$ and $\partial_\nu \bar\phi$. But when $\alpha(X)$ is a function of $X^\mu$, we’ll get additional terms from the product rule when the derivative hits the $\alpha.$ These don’t cancel out.
The way to resolve this problem is to replace the derivative $\partial_\mu\phi$ with the covariant derivative,
$$ D_\mu \phi = \partial_\mu\phi + i q A_\mu \phi, $$
where $A_\mu$ is a new field: the electromagnetic potential. And you’ve likely met at least its first component before:
$$ A_\mu = \left( - V, A_x,A_y,A_z \right). $$
$V$ is the regular old electric potential—i.e. the voltage—and $\boldsymbol A = (A_x,A_y,A_z)$ is the vector potential, which is needed in addition to $V$ once magnetic fields are thrown into the mix. They’re related to the electric and magnetic fields by taking a few derivatives. I’ll hopefully go through those details in a later lesson.
What this covariant derivative does for us is restore the nice transformation property that leaves the Lagrangian invariant. If we require when our transformation rotates $\phi \to e^{-iq\alpha}\phi$ that the symmetry simultaneously acts on $A_\mu$ as
$$ A_\mu \to A_\mu + \partial_\mu \alpha, $$
then the covariant derivative transforms as
$$ D_\mu \phi \to e^{-iq\alpha(X)} D_\mu \phi, $$
just like we previously had with the regular derivative when $\alpha$ was a constant! I’ll leave that as a little exercise for you to check. Then if we replace the ordinary derivatives with covariant derivatives in our original Lagrangian,
$$ \mathcal{L} = -\eta^{\mu\nu}\overline{D_\mu\phi} D_\nu\phi - \kappa^2 \bar\phi \phi, $$
we get a theory that’s invariant even when $\alpha(X)$ is a function of spacetime, because the rotations again come outside the derivatives and cancel against each other.
The result of this procedure, which just looks like a trick at first glance but has deep theoretical underpinnings, is that the field $\phi$ now carries electric charge $q$, and $\bar\phi$ carries charge $-q$. The Noether current is now a source for the electromagnetic field, just like any electric charges and currents would produce electric and magnetic fields according to Maxwell’s equations.
But speaking of Maxwell’s equations, there’s still one thing missing from our Lagrangian. Maxwell’s equations determine how electric and magnetic fields are produced by charges and currents, and how they evolve with time. They’re the equations of motion for the electromagnetic potential $A_\mu$, just like the Klein-Gordon equation was the field equation for $\phi.$ Then there should be terms for $A_\mu$ in the Lagrangian that give us Maxwell’s equations when we apply the principle of least action.
Again, I’m just going to quickly tell you the answer right now, and we can explore it more in the future. The Lagrangian for pure electromagnetism is
$$ \mathcal{L}_\mathrm{EM} = -\frac{1}{4} F^{\mu\nu}F_{\mu\nu}, $$
where the electromagnetic field strength $F_{\mu\nu}$ is defined by
$$ F_{\mu\nu} = \partial_\mu A_\nu - \partial_\nu A_\mu. $$
It’s a 4x4 matrix that packages up the electric and magnetic fields in a particular way:
$$ F_{\mu\nu} = \begin{pmatrix} 0 & -E_x & - E_y & -E_z\\ E_x & 0 & B_z & -B_y\\ E_y & -B_z & 0 & B_x\\ E_z & B_y & -B_x & 0 \end{pmatrix}. $$
And $F^{\mu\nu}$ with its indices up is another shorthand: it’s what we get by matrix multiplying with $\eta$ to “raise the indices,” $F^{\mu\nu} = \eta^{\mu\rho}F_{\rho\sigma}\eta^{\sigma\nu}.$ It looks like something fancy, but again the point is just to take care of those pesky relative minus signs between time and space terms that always come up in special relativity.
All together then, the Lagrangian for the electromagnetic potential $A_\mu$ and our electrically charged field $\phi$ is
$$ \mathcal{L} = -\overline{D^\mu\phi} D_\mu\phi - \kappa^2 \bar\phi \phi - \frac{1}{4} F^{\mu\nu} F_{\mu\nu}. $$
If all this is new to you, then this is a lot of information to try to process at once. So just take it as a teaser for future lessons where we can dive into more of the details.
Speaking of which, let me finish with one last teaser. What we’ve been talking about this lesson by putting the Klein-Gordon theory together with electromagnetism is a good example for starting to learn the ideas of symmetries and gauge theory without too many extra technical complications. And it is physically relevant for the piece of the standard model Lagrangian that describes the Higgs boson, which is a generalization of this theory.
But the most fundamental theory of electromagnetism is quantum electrodynamics (QED), which was the first piece of the standard model to be constructed, and that’s the theory of the electron field and the electromagnetic potential. And the electron is not described by a scalar field $\phi$—it’s described by a spinor field $\Psi$. The Lagrangian for QED is
$$ \mathcal{L} = i \bar\Psi \gamma^\mu D_\mu \Psi - m \bar\Psi\Psi - \frac{1}{4} F^{\mu\nu}F_{\mu\nu}. $$
The ideas we’ve learned here go over directly to QED. It starts with the theory of a free electron, which has a $\mathrm{U}(1)$ rotation symmetry. We gauge the symmetry by replacing the ordinary derivative with a covariant one, which means that the field becomes electrically charged. And then we add on the Lagrangian for the electromagnetic field itself.
The other factors of the standard model generalize this procedure, with more fields and a larger gauge symmetry, $\mathrm{SU(3)\times SU(2) \times U(1)}.$ Things get quite a bit more complicated though when the symmetry gets bigger than $\mathrm{U}(1)$! But the same basic principle of gauge symmetry persists and is fundamental to the construction of everything we know about particle physics.
See also:
If you encounter any errors on this page, please let me know at feedback@PhysicsWithElliot.com.