Measurement and evolution

In an earlier post, we sketched the basic mathematical description of quantum mechanics, culminating in the general description of quantum states as (reduced) density matrices. We also claimed that generic measurements are not orthogonal projections, and evolution is not unitary. We shall here expand upon the aforementioned infrastructure to explain these statements, resolving some un-answered questions in the process. We shall again draw from Preskill’s Quantum Information and Computation course notes, as well as a lecture given by Mario Flory on POVMs and superoperators.

The naïve picture is that, as a consequence of Schmidt decomposition, one can write the density matrix for a mixed state as an ensemble of orthogonal pure states, the eigenvalues of which are interpreted as the probability of their occurring. When we measure the system, we project onto one of these eigenstates, hence the notion of measurements as orthogonal projections. And indeed this works fine for isolated systems; but as explained previously, this is an idealization. The problem that demands a more generalized notion of measurement is that an orthogonal measurement in a tensor product {\mathcal{H}_A\otimes\mathcal{H}_B} is not necessarily orthogonal if we restrict to subsystem {A} alone.

Let us first make the notion of orthogonal projections a bit more precise, following von Neumann’s treatment thereof. To perform a measurement of an observable {M}, we couple the system to some classical pointer variable that we can actually observe, in the literal sense of the word. In particular, we assume that the pointer is sufficiently heavy that the spreading of its wavepacket can be neglected during the measurement process (it is classical, after all). The Hamiltonian describing the interaction of the pointer with the system is then approximated by {H=\lambda MP}, where {\lambda} is the coupling between the pointer’s momentum {P} and the observable under study. The time evolution operator is therefore

\displaystyle U(t)=\mathrm{exp}\left(-i\lambda tMP\right)=\sum_i\left|i\right>\mathrm{exp}\left(-i\lambda t M_iP\right)\left<i\right|~, \ \ \ \ \ (1)

where in the second equality we’ve expanded {M} in the diagonal basis, {M=\sum_i\left|i\right>M_i\left<i\right|}. (Note that we are implicitly assuming that either {\left[M,H_0\right]=0}, where {H_0} is the original, unperturbed Hamiltonian, or that the measurement occurs so quickly that free evolution of the system can be neglected throughout. We’re also suppressing hats/bold-print on the operators, since this is clear from context).

Since {P=-i\partial_x} is the generator of translations for the pointer, it shifts the position-space wavepacket thereof by some amount {x_0}: {e^{-ix_0P}\psi(x)=\psi(x-x_0)}. Thus, if the system is initially in a superposition of {M} eigenstates unentangled with the state of the pointer {\left|\psi(x)\right>}, then after time {t} it will evolve to

\displaystyle U(t)\left(\sum_i\alpha_i\left|i\right>\otimes\left|\psi(x)\right>\right) =\sum_i\alpha_i\left|i\right>\otimes\left|\psi\left( x-\lambda tM_i\right)\right>~. \ \ \ \ \ (2)

Now the position of the pointer is correlated with the value of the observable {M}. Thus, provided the pointer’s wavepacket is sufficiently narrow such that we can resolve all values of {M_i} (namely, {\Delta x\lesssim\lambda t\Delta M_i}, which can be guaranteed by making the pointer sufficiently massive since {\Delta x\gtrsim1/\Delta p=(mv)^{-1}}), observing that the position of the pointer has shifted by {\lambda tM_i} is tantamount to measuring the eigenstate {\left|i\right>}, which occurs with probability {\left|\alpha_i\right|^2}. In this manner, the initial state of the quantum system, call it {\left|\phi\right>}, is projected to {\left|i\right>} with probability {\left<i|\phi\right>^2}. This is von Neumann’s model of orthogonal measurement, which involves so-called projection valued measurements, or PVMs.

Of course, in principle the measurement process could project out some superposition of eigenstates, rather than a single position eigenstate as in the above example. Indeed, if we can couple any observable to a pointer, then we can perform any orthogonal projection in Hilbert space. Thus to formulate the above more generally, consider a set of projection operators {P_a} such that {\sum_aP_a=1}. Carrying out the measurement procedure above takes the initial (pure) state {\left|\phi\right>\left<\phi\right|} to

\displaystyle \frac{P_a\left|\phi\right>\left<\phi\right|P_a}{\left<\phi|P_a|\phi\right>} \ \ \ \ \ (3)

with probability

\displaystyle \mathrm{Prob}(a)=\left<\phi|P_a|\phi\right>~, \ \ \ \ \ (4)

as usual.

Thus far we have been referring to measurements on a single isolated Hilbert space, for which PVMs suffice. But in practice we only ever deal with subsystems, for which our concept of measurement must be suitably extended. As we shall see, the relevant entities for the job are positive operator valued measures, or POVMs. The key difference between a POVM and a PVM is that the latter are a subset of the former for which the eigenstates are orthogonal by construction.

Mathematically, a POVM is a measure (basically, a partition of unity) whose values are non-negative self-adjoint operators on Hilbert space. That is, denoting the set of operators that comprise the POVM by {\{F_a\}}, it has the properties {F_a=F_a^\dagger}, {\left<\psi|F_a|\psi\right>\geq0}, and {\sum_aF_a=1}, where {\left|\psi\right>\in\mathcal{H}}. The idea is that a POVM element {F_a} is assigned to every possible measurement result such that {\left<\psi|F_a|\psi\right>=\mathrm{Prob}(a)} (hence the requirement that these sum to 1).

Given the positivity of the operators {F_a}, there exists a (not necessarily unique) set of so-called measurement operators {\{M_a\}} such that {F_a=M_a^\dagger M_a}. Introducing these operators allows one to express the state immediately after measurement in the usual manner:

\displaystyle \left|\psi_a\right>=\frac{M_a\left|\psi\right>}{\left<\psi\right|M_a^\dagger M_a\left|\psi\right>^{1/2}}~. \ \ \ \ \ (5)

Note that this expression is precisely the same as that given for PVMs above; in other words, {M_a=P_a} identically. The difference here is that in the case of a POVM, repeated measurement will not necessarily yield the same result. This is because unlike the {P_a}, which are idempotent orthogonal projection operators, the {F_a} are not projectors, and hence the state after measurement does not exist in a single orthogonal eigenstate. The PVM {\{P_a\}}, which is used in decomposing an observable {A=\sum_aa_aP_a}, corresponds to the special case of a POVM with {F_a=P_a\left(=M_a\right)}.

To elaborate on this slightly further, let us take the familiar example of a tensor product space {\mathcal{H}=\mathcal{H}_A\otimes\mathcal{H}_B}, containing an initial state {\rho_{AB}=\rho_A\otimes\rho_B} and a PVM given by {\{P_a\}}. We now wish to restrict our attention to {\mathcal{H}_A}, so we define a new set of operators {\{F_a\}} acting thereupon that faithfully reproduces the outcome labeled by index {a} of a measurement on {\mathcal{H}}, namely:

\displaystyle \mathrm{Prob}(a)=\mathrm{tr}\left( P_a\rho_{AB}\right)=\mathrm{tr}_A\left(\mathrm{tr}_B\left( P_a\rho_{AB}\right)\right)\equiv\mathrm{tr}_A\left({F_a\rho_A}\right)~. \ \ \ \ \ (6)

We may obtain an explicit expression for {F_a} by writing this expression in component form. Recall that a reduced density matrix can be written in terms of basis vectors as

\displaystyle \rho_A=\mathrm{tr}_B\left(\left|\psi\right>\left<\psi\right|\right)=\sum_{ijm}a_{mj}^*a_{ij}\left|i\right>_{A~A}\left<m\right|~. \ \ \ \ \ (7)

Since {j} is a dummy index, this requires two indices when written in matrix notation, {\left(\rho_A\right)_{im}}. This implies that four indices will label the tensor product {\rho_{AB}=\rho_A\otimes\rho_B}. The quantity {F_a\rho_A} therefore carries two free indices (since {F_a} is a map from {\mathcal{H}_A\rightarrow\mathcal{H}_A}), and similarly {P_a\rho_{AB}} carries four, all of which will be summed over when taking the appropriate traces. Hence the above expression, in component form, is

\displaystyle \begin{aligned} \sum_{ijmn}\left( P_a\right)_{nj,mi}\left(\rho_A\right)_{ij}&\left(\rho_B\right)_{mn}=\sum_{ij}\left( F_a\right)_{ji}\left(\rho_A\right)_{ij}\\ \implies\left( F_a\right)_{ji}=&\sum_{mn}\left( P_a\right)_{nj,mi}\left(\rho_B\right)_{mn}~, \end{aligned} \ \ \ \ \ (8)

where {\{\left|i\right>\}}, {\{\left|j\right>\}} and {\{\left|m\right>\}}, {\{\left|n\right>\}} are orthonormal bases for {\mathcal{H}_A} and {\mathcal{H}_B}, respectively. With this expression for {F_a} in hand, one can show (see, e.g., Preskill p87) that the {F_a} do indeed satisfy the properties claimed for it above, namely Hermiticity, positivity (non-negativity), and completeness {\left(\sum_aF_a=I_A\right)}. As we have emphasized however, they are not necessarily orthogonal, which is again the crucial difference between POVMs and PVMs. Indeed, the number of {F_a}‘s is limited by the dimension of the total Hilbert space {\mathcal{H}}, which may be arbitrarily greater than that of {\mathcal{H}_A}.

As one might have expected given that POVMs act on subspaces, a POVM can be lifted to a PVM by expanding the Hilbert space of the former and performing the latter in the resulting superspace. This is the content of Neimark’s (sometimes transliterated from the Cyrillic “Наймарк” as “Neumark”) theorem. Note that the converse also holds: any PVM on a Hilbert space reduces to a POVM on any subspace thereof. This means that one can realize a POVM as a PVM on an enlarged Hilbert space, which allows one to obtain the correct measurement probabilities (by which we mean, the relative weights in the ensemble; see below) by performing orthogonal projections. Conversely, an orthogonal measurement of a bipartite system {\mathcal{H}_A\otimes\mathcal{H}_B} may be a nonorthogonal POVM on {A} alone.

In addition to the crucial role they play in measurement, POVMs are useful for formulating a suitable generalization of evolution that applies to subsystems. By way of example, suppose the initial state in {\mathcal{H}=\mathcal{H}_A\otimes\mathcal{H}_B} is given by {\rho_{AB}=\rho_A\otimes\left|0\right>_{BB}\left<0\right|}. Since evolution of the total bipartite system is unitary, it is described by the action of a unitary operator {U_{AB}},

\displaystyle U_{AB}\left(\rho_A\otimes\left|0\right>_{BB}\left<0\right|\right) U_{AB}^\dagger~, \ \ \ \ \ (9)

whereupon the density matrix of subsystem {A} is

\displaystyle \rho'_A=\mathrm{tr}_B\left( U_{AB}\left(\rho_A\otimes\left|0\right>_{BB}\left<0\right|\right) U_{AB}^\dagger\right) =\sum_n{}_B\left<n\right|U_{AB}\left|0\right>_B\rho_A{}_B\left<0\right| U_{AB}^\dagger\left|n\right>_B~, \ \ \ \ \ (10)

where {\{\left|n\right>\}} is an orthonormal basis for {\mathcal{H}_{B}}, and {{}_B\left<n\right|U_{AB}\left|0\right>_B\equiv M_n} is an operator acting on {\mathcal{H}_{A}}. Note that it follows from the unitarity of {U_{AB}} that

\displaystyle \sum_nM_n^\dagger M_n =\sum_n{}_B\left<0\right|U_{AB}^\dagger\left|n\right>_{BB}\left<n\right| U_{AB}\left|0\right>_B ={}_B\left<0\right|U_{AB}^\dagger U_{AB}\left|0\right>_B =I_A~. \ \ \ \ \ (11)

We may thus expression {\rho'_A} succinctly as

\displaystyle \rho'_A=\sum_nM_n\rho_A M_n^\dagger\equiv\$\left(\rho_A\right)~, \ \ \ \ \ (12)

where {\$} is a linear map that takes density matrices to density matrices (linear operators to linear operators). Such a map, when the above property of {M_n} is satisfied, is called a superoperator, which we’ve written here in the so-called operator sum or Kraus representation. The operator sum representation of a given superoperator {\$} is not unique, since performing the trace over {\mathcal{H}_B} in a different basis would lead to different measurement operators {N_i}. However, any two operator sum representations of the same superoperator are related by a unitary change of basis, e.g., {N_i=U_{in}M_n} (in other words, the {M_n} may be thought of as a particular choice of the {E_a} considered above).

The mapping {\$:\rho\rightarrow\rho'} inherits the usual properties from {\rho}: it is Hermitian, positive, and trace-preserving ({\mathrm{tr}\rho'=1} if {\mathrm{tr}\rho=1}). But these are not quite sufficient to ensure that our bipartite system evolves unitarily. The basic reason is that we are limiting our attention to subsystem {A}, and have no guarantee that there does not exist an uncoupled system {B} that evolves in such a manner as to screw things up. To amend this, we demand that {\$_A} instead satisfy complete positivity: given any extension of {\mathcal{H}_A} to {\mathcal{H}_A\otimes\mathcal{H}_B}, {\$_A} is completely positive in {\mathcal{H}_A} if {\$_A\otimes I_B} is positive for all such extensions. For an example of the necessity of this requirement, see Preskill p97-98 for an exposition of the transposition operator, {T:\rho\rightarrow\rho^T}, which is a positive operator that is not completely positive.

In addition to these three necessary properties, it is also customary to assume that {\$} is linear. As alluded in the previous post on the subject, non-linear evolution is difficult to reconcile with the ensemble interpretation, due to the inherently linear nature of probability. In some sense, linearity is demanded by the probabilistic interpretation — and indeed, as explained in Preskill, non-linear evolution can lead to rather strange consequences — but I’m not aware of any rigorous proof. Nonetheless, for the time being we shall demand this property of superoperators as well.

Unitary evolution, for an isolated system, is described by the Schrödinger equation. The analagous equation for general evolution by superoperators is called the Master equation. Preskill elaborates on this in some detail in section 3.5, but we will restrain ourselves from getting involved in such details here. Instead, we merely observe that unitary evolution can be thought of as the special case in which the operator sum contains only a single term. Under unitary evolution, pure states can only evolve to pure states:

\displaystyle \left|\psi\right>\left<\psi\right| \rightarrow U\left(\left|\psi\right>\left<\psi\right|\right) U^\dagger =\left|\psi'\right>\left<\psi'\right|~, \ \ \ \ \ (13)

and similarly mixed states remain mixed. But superoperators allow the evolution of pure states to mixed states. This is called decoherence. It is the process by which initially pure states become entangled, and consequently, it plays a fundamental role in both the mathematics of quantum mechanics and the (philosophical) interpretation thereof.

To connect back to our earlier example, suppose we perform a POVM on {\mathcal{H}_A}. By (11) and (12), this is tantamount to evolving the system with a superoperator that takes

\displaystyle \rho\rightarrow\sum_a\sqrt{F_a}\rho\sqrt{F_a}~. \ \ \ \ \ (14)

By Neimark’s theorem, the POVM {\{F_a\}} has a unitary representation on the bipartite space {\mathcal{H}}, meaning that there exists a unitary {U_{AB}} such that

\displaystyle U_{AB}:\left|\phi\right>_A\otimes\left|0\right>_B\rightarrow\sum_a\sqrt{F_a}\left|\phi\right>_A\otimes\left|a\right>_B~. \ \ \ \ \ (15)

In other words, the bipartite system undergoes a unitary transformation that entangles {A} with {B},

\displaystyle \left|\phi\right>_A\left|0\right>_B\rightarrow\sum_aM_a\left|\phi\right>_A\left|0\right>_B~. \ \ \ \ \ (16)

We could thus describe the measurement by a PVM on {\mathcal{H}_B} that projects onto {\{\left|a\right>\}} with probability

\displaystyle \mathrm{Prob}(a)=_A\left<\phi\right|M_a^\dagger M_a\left|\phi\right>_A=\mathrm{tr}\left( F_a\rho_A\right)~, \ \ \ \ \ (17)

where the second equality follows from comparison with (6). Normalizing the final state accordingly, we may write (14) as

\displaystyle \rho\rightarrow\$\rho=\frac{\sqrt{F_a}\rho_A\sqrt{F_a}}{\mathrm{tr}\left( F_a\rho_A\right)}~. \ \ \ \ \ (18)

We mentioned previously that for POVMs, repeated measurements will not necessarily yield the same result. Now we see why: the result of such a general measurement (that is, on a subsystem) is given an ensemble of pure states, and thus we require a description in terms of a density matrix rather than as a single (orthogonal) eigenstate.

This is also the description we would use if we knew only that a measurement had been performed, but were ignorant of the results. For example, suppose we perform a measurement by probing the system with a single particle (say, a photon from a laser). Immediately after the interaction with the probe, but before the interaction with the classical detector that records it, the system is in an entangled state. We would thus describe the process as evolution by a superoperator that produces a density matrix/ensemble as above. In other words, the system has slightly decohered: if the initial state were pure, some of the coherence has been lost upon evolution to a mixed state. The subsequent interaction with the (classical) detector that we colloquially think of as “measurement” is simply the same process of decoherence on a hugely expanded scale: the (now mixed) state becomes entangled with the trillions of particles that comprise the detector, decohering essentially instantaneously to a classical state. All the uniquely quantum information of the system has now been lost.

This is what is referred to as “collapse of the wavefunction” in the Copenhagen interpretation. The reason for the invalidity of this interpretation is that it posits a projection onto a single eigenstate as a result of observation (by which we simply mean, interaction with the measurement apparatus; anthropocentric language aside, consciousness is emphatically not involved in any fundamental way). But as we’ve seen above, a proper description of measurement is that of entanglement with the environment under evolution via superoperators. The measurement process proceeds by POVMs, not PVMs, on the (sub)system under study. And while at the end of the day one does arrive at an eigenstate in the expanded Hilbert space (that includes the measurement apparatus/detector/observer/etc), this is a consequence of decohering to a classical state, rather than directly projecting to it. Decoherence can thus be thought of as giving the appearance of wavefunction collapse; but as evidenced by the countless reams of confused literature on quantum foundations and related areas, it is most dangerous to indulge in such simplifications so blithely. (We note in passing that the “wavefunction of the universe” never decoheres, since evolution in an isolated system is unitary).

Another important fact that no doubt contributes to the collapse confusion is that decoherence is irreversible. Consider composing two superoperators to form a third: if {\$_1} describes the evolution from {t_0} to {t_1>t_0}, and {\$_2} describes the evolution from {t_1} to {t_2>t_1}, then {\$_1\circ\$_2} is a superoperator describing the evolution from {t_0} to {t_2}. But the inverse of a superopertor is only a superopertor if it is unitary. This is in stark contrast to unitary evolution, which is perfectly invertible: we can run the equations backwards as well as forwards. Not so for superoperators: inverting {\$_1\circ\$_2} will not result in a superoperator that evolves backwards from {t_2} to {t_0}. In other words, decoherence implies an arrow of time, and an irrevocable loss of quantum information. And while the former implication has philosophical implications which we shall not digress upon here, the latter is not at all surprising: as stated above, decoherence is the process by which quantum states become classical.

Several open questions remain. Perhaps chief among them is our failure to fully resolve the “disconcerting dualism” between deterministic evolution and probabilistic measurement. Insofar as probability is a statement of our ignorance and thus fundamentally epistemic, any formulation of quantum mechanics that relies thereupon is doomed to suffer the same characterization, for what does it mean to say that nature is fundamentally probabilistic? We may ask whether the associated lack of predictivity in quantum mechanics stems from the fact that there does not exist a state which is an eigenstate of all observables. One also wonders whether it is possible to formulate a consistent theory with non-linearly evolving superoperators, and what the interpretation thereof would be vis-à-vis probabilistic ensembles (that is, to what extent we can free ourselves from probability if we distance ourselves from the linearity it imposes). Zurek’s work on decoherence contains some clarifying insight into this issue, but that’s a subject for another post.

It is tempting to speculate that the issue of how to properly describe measurement and evolution lies at the heart of the black hole information paradox, wherein a black hole formed from the collapse of an initially pure state appears to evolve to a mixed state, in violation of the supposedly unitary S-matrix. Indeed, for various reasons, this picture is almost certainly too naïve. In particular, evolution is not unitary, but it remains to be shown precisely how a more ontologically accurate rendition of the problem would solve it.

This entry was posted in Physics. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s