The First Thing You’ll Learn in a String Theory Class

String theory has a reputation for being a very challenging subject—and when you get deep into the details it is!—but the basic idea is very natural and is a fairly straightforward generalization of what you've been learning if you've been following along with the last few lessons I've shared about the principle of least action for a particle in Einstein's theory of relativity.

We've been learning that the action for a particle in Einstein's theory has a very simple and geometric interpretation. As a particle travels around through spacetime, it traces out a curve that's called its worldline. Then the action is equal, up to some factors, to the length of the worldline, and the principle of least action says that the particle will choose the shortest path that it can in traveling between two events.

String theory replaces the fundamental role of a particle with a tiny loop of string. Whereas a particle traces out a one-dimensional curve as it moves through spacetime, a string traces out a two-dimensional surface. The particle's curve we called the worldline; the string's surface we call the worldsheet.

Picture something like the surface of a bubble as you wave a bubble wand around through the air. The rim of the wand is the string in this analogy, and the bubble that's trailed out behind it as you wave the wand is the worldsheet.

Now we want to write down a principle of least action for this string. If the natural action for a particle was simply the length of its worldline, then the most obvious generalization for the string is the area of its worldsheet.

So in this lesson, I'll show you how we can express the action for a string as the area of the worldsheet that it traces out in spacetime, and we'll learn some very cool math about the geometry of surfaces along the way.

First of all, let's forget about string theory and Minkowski spacetime and all that, and just figure out the little bit of math that we need to compute the area of a surface. Picture that bubble wand again. As you wave it around through the air, the bubble that trails behind it is a 2D surface embedded in 3D space.

Let's write the coordinates of space as $\vec{{}X} = (x, y, z)$. If we'd had a line in space instead of a surface, we would describe it by specifying a curve $\vec {{}X}(\lambda)$, where $\lambda$ is some parameter along the curve. This function tells us how each point $\lambda$ in parameter space gets mapped to a point $\vec{{}X}(\lambda)$ in 3D space.

Now when we graduate to our 2D surface instead of a curve, we need another parameter, call it $\sigma$, say. Then we specify the surface by a function $\vec{{}X}(\sigma,\lambda)$ that tells us how each point $(\sigma,\lambda)$ in the now 2D parameter space gets mapped to a point $\vec{{}X}(\sigma,\lambda)$ in 3D space.

Think of the $\sigma$ direction as a circle and the $\lambda$ direction as a line, so that together the parameter space is the surface of a cylinder. When $\vec{{}X}(\sigma,\lambda)$ maps the cylinder into 3D space, it can get warped around to make a curvy surface like, well, a bubble.

So the question we need to answer is, if someone hands us a surface by writing down its function $\vec{{}X}(\sigma,\lambda)$, how do we compute its area? Think about a little rectangular area of the parameter space, of width $\mathrm{d}\sigma$ and height $\mathrm{d}\lambda$. That little region will map to another little region of the surface in 3D space, of some tiny area $\mathrm{d}a$. This region doesn't have to be a rectangle anymore, since the map by $\vec{{}X}$ can distort the shape, so in general it will be some parallelogram.

The length of the sides of this parallelogram should be fixed by the lengths $\mathrm{d}\sigma$ and $\mathrm{d}\lambda$ that we started with, along with the given map $\vec{{}X}$. What are they? The bottom left corner of the rectangle was at $(\sigma,\lambda)$, and gets mapped to $\vec{{}X}(\sigma,\lambda)$. The bottom right corner was at $(\sigma + \mathrm{d}\sigma, \lambda)$, and gets mapped to $\vec{{}X}(\sigma +\mathrm{d}\sigma,\lambda)$. So we can draw a vector along the "$\sigma$" side of the parallelogram from $\vec{{}X}(\sigma,\lambda)$ to $\vec{{}X}(\sigma + \mathrm{d}\sigma,\lambda)$:

$$\vec{{}X}(\sigma + \mathrm{d}\sigma,\lambda) - \vec{{}X}(\sigma,\lambda). $$

If we divide this vector by $\mathrm{d}\sigma$, then we just get the derivative of $\vec{{}X}(\sigma,\lambda)$ in the direction of $\sigma$:

$$\frac{\vec{{}X}(\sigma + \mathrm{d}\sigma,\lambda) - \vec{{}X}(\sigma,\lambda)}{\mathrm{d}\sigma} = \frac{\partial \vec{{}X} }{\partial \sigma }.$$

The curly $\partial$'s here stand for partial derivatives; if you haven't seen them in your math classes before don't worry too much about it. They're just like regular old derivatives, except that our function $\vec{{}X}(\sigma,\lambda)$ depends on more than one variable, so we use the partial symbol to indicate that we're only looking at the rate of change with respect to one of them.

So, we have a vector pointing along one side of the parallelogram given by

$$\mathrm{d}\sigma \frac{\partial \vec{{}X} }{\partial \sigma }$$

and likewise the vector along the other side will be

$$\mathrm{d}\lambda \frac{\partial \vec{{}X}}{\partial \lambda }.$$

And now we just have a little geometry problem. If you have two vectors, $\vec{{}A}$ and $\vec{{}B}$, and you want to know the area of the parallelogram they make, how do you get it? It's the length of one side, $A$ say, times the height of the parallelogram, which is $B \sin(\theta)$, where $\theta$ is the angle between the two vectors. So the area of the parallelogram is

$$\mathrm{area}=AB \sin(\theta),$$

which you might recognize as the magnitude of the cross product $\vec{{}A}\times \vec{{}B}$. We can rewrite this more conveniently as

$$\mathrm{area}^2 = A^2B^2(1 - \cos^2(\theta)) = A^2B^2 - (\vec{{}A}\cdot \vec{{}B})^2,$$

where I used the dot product $\vec{{}A}\cdot \vec{{}B} = A B \cos (\theta)$. So, the area of a parallelogram spanned by two vectors $\vec{{}A}$ and $\vec{{}B}$ is

$$\mathrm{area} = \sqrt{|\vec{{}A}|^2 |\vec{{}B}|^2 - (\vec{{}A}\cdot \vec{{}B})^2},$$

where $|\vec{{}A}|^2 = \vec{{}A} \cdot \vec{{}A}$ denotes the magnitude squared of $\vec{{}A}$.

Back on our surface, the area of our little piece of the bubble spanned by $\vec{{}A} = \mathrm{d}\sigma \frac{\partial \vec{{}X}}{\partial \sigma }$ and $\vec{{}B} = \mathrm{d}\lambda \frac{\partial \vec{{}X} }{\partial \lambda }$ is

$$\mathrm{d}a = \sqrt{ \left| \mathrm{d}\sigma \frac{\partial \vec{{}X} }{\partial \sigma } \right|^2\left| \mathrm{d}\lambda \frac{\partial \vec{{}X} }{\partial \lambda } \right|^2- \left( \mathrm{d}\sigma \frac{\partial \vec{{}X} }{\partial \sigma } \cdot \mathrm{d}\lambda \frac{\partial \vec{{}X} }{\partial \lambda} \right)^2 }.$$

This looks like a bit of a mess, but we can simplify it a lot. First of all, each term has a factor of $(\mathrm{d}\sigma \mathrm{d}\lambda)^2$; let's pull that outside the square-root:

$$\mathrm{d}a = \mathrm{d}\sigma\mathrm{d}\lambda \sqrt{ \left| \frac{\partial \vec{{}X} }{\partial \sigma } \right|^2 \left| \frac{\partial \vec{{}X} }{\partial \lambda } \right|^2- \left( \frac{\partial \vec{{}X} }{\partial \sigma } \cdot \frac{\partial \vec{{}X} }{\partial \lambda} \right)^2 }.$$

Now let's define a $2\times 2$ matrix $h$ like so:

$$h = \begin{pmatrix} \displaystyle \left|\frac{\partial \vec{{}X}}{\partial \sigma }\right|^2 & \displaystyle \frac{\partial \vec{{}X}}{\partial \sigma } \cdot \frac{\partial \vec{{}X} }{\partial \lambda }\\ &\\ \displaystyle \frac{\partial \vec{{}X}}{\partial \sigma }\cdot \frac{\partial \vec{{}X}}{\partial \lambda } & \displaystyle \left| \frac{\partial \vec{{}X}}{\partial \lambda }\right|^2 \end{pmatrix}.$$

Then notice that the quantity inside the square-root is nothing but the determinant of this matrix! So we can write our formula for the little bit of area $\mathrm{d}a$ much more compactly as

$$\mathrm{d}a = \mathrm{d}\sigma \mathrm{d}\lambda \sqrt{\mathrm{det}(h)}.$$

Areas (and volumes and so on) can always be written like this. $h$ is the metric on the surface, just like the metrics $\eta_{\mu\nu}$ and $g_{\mu\nu}$ that we encountered in the previous lessons on special and general relativity. Metrics, remember, tell us how to measure distances in a given space, so naturally they also tell us how to measure areas. The result is that the area is determined by the square-root of the determinant of the metric. To get the total area, we sum up all these little parallelograms:

$$\int \mathrm{d}a = \int \mathrm{d}\sigma \mathrm{d}\lambda \sqrt{\det(h)}.$$

Okay, that was the math that we needed to cover. Now let's get back to the physics. We were thinking here of a bubble getting traced out in 3D space as you wave the wand around. Now it's a short step to the worldsheet of a little loop of string that gets traced out as it evolves in spacetime.

We only need to make a couple of modifications here to write down the area of the worldsheet. First of all, instead of the 3-component vector $\vec {{}X}$ giving coordinates in 3D space, we need the 4-component vector $X^\mu$ that gives coordinates on spacetime, just like we used in the previous mini lesson on relativity. Likewise, we need to replace the familiar "Pythagorean" notion of distance in regular space with the Minkowski metric $\eta_{\mu\nu}$ in spacetime. (We could also pick a curved metric $g_{\mu\nu}$ like in general relativity, to describe a string in a curved spacetime, but let's stick to flat spacetime here to keep things simple.)

That affects all the places we computed the magnitude and dot products of vectors. So our new matrix $h$ is

$$h = \begin{pmatrix}\displaystyle \eta_{\mu\nu} \frac{\partial X^\mu}{\partial \sigma } \frac{\partial X^\nu}{\partial \sigma } &\displaystyle \eta_{\mu\nu} \frac{\partial X^\mu}{\partial \sigma } \frac{\partial X^\nu}{\partial \lambda }\\ &\\ \displaystyle \eta_{\mu\nu} \frac{\partial X^\mu}{\partial \sigma } \frac{\partial X^\nu}{\partial \lambda } &\displaystyle \eta_{\mu\nu} \frac{\partial X^\mu}{\partial \lambda } \frac{\partial X^\nu}{\partial \lambda } \end{pmatrix}.$$

One last thing: just like when we wrote down the length of the particle's worldline, $\int \sqrt{-\mathrm{d}s^2}$, we had to flip the sign of $\mathrm{d}s^2$ before we took the square-root because it was negative. The same goes for the determinant of $h$ in Minkowski spacetime. So our formula for the area of the worldsheet that the string sweeps out in spacetime is

$$\mathrm{area} = \int \mathrm{d}\sigma \mathrm{d}\lambda\sqrt{-\mathrm{det}(h)}.$$

This is a generalization of what we learned before for the length of the worldline. In that case, we had a curve $X^\mu(\lambda)$ instead of a surface $X^\mu(\sigma,\lambda).$ Then the $2\times 2$ matrix $h$ is reduced to a single entry, $\eta_{\mu\nu}\frac{\partial X^\mu}{\partial \lambda }\frac{\partial X^\nu}{\partial \lambda }$. The "determinant" of a $1\times 1$ matrix doesn't do anything at all, it just returns that same number. And so the length of the worldline was

$$\mathrm{length} = \int \mathrm{d} \lambda \sqrt{-\eta_{\mu\nu}\frac{\partial X^\mu }{\partial \lambda } \frac{\partial X^\nu }{\partial \lambda }},$$

just like we wrote down last time.

The action for our point particle was $-mc$ times this length. Those constants had to be there to get the units right: the action has units of energy times time, or $\mathrm{kg \cdot m^2/s}$. So $mc$ times the length of the worldline has units

$$\mathrm{kg \cdot \frac{m}{s} \cdot m = kg \cdot \frac{m^2}{s}} $$

like we wanted.

As for our string, we can write the action as

$$S = - \frac{T}{c} \int \mathrm{d}\sigma \mathrm{d}\lambda \sqrt{-\det(h)},$$

where $T$ is the tension in the string. The units are

$$\mathrm{\frac{N}{m/s} \cdot m^2 = N \cdot m \cdot s}.$$

Newtons times meters give us energy, and so we indeed get the right units for the action.

So there we have it! This is the action for a relativistic string. The principle of least action says that the string will evolve along the worldsheet with extremal area, similar to how a particle picks the worldline with extremal length.

You can derive the resulting equation of motion for the string in the usual way, by making a small variation of the surface $X^\mu$ and demanding that the action doesn't change to leading order. The solutions are called harmonic functions, which are very special and show up in a wide variety of contexts.

So that's the beginning of string theory—it's called the Nambu-Goto action for a relativistic string. And it's safe to say that that's just the tip of the iceberg for string theory. But if you've stuck with me this far you've already got a good head start on the subject.