max(X, Y) or sqrt(X)?

Cover Image for max(X, Y) or sqrt(X)?

Intro

I originally learnt this from Stand-up Maths who creates excellent entertaining videos about fun results like this one. Suppose you have a random number generator that returns numbers between 0 and 1 with equal probability. Would you rather pick the maximum of two random numbers or the (positive) square root of one random number if you want the largest value?

In more formal mathematical terms, let XX and YY be two independent random variables with the standard uniform distribution.

X,YU(0,1)X, Y \sim U(0, 1)

How do the distributions of max(X,Y)max(X, Y) and sqrt(X)sqrt(X) relate to each other?

First observations

What do max(X,Y)max(X, Y) and sqrt(X)sqrt(X) do at a basic level? Imagine XX as a random point on a line between 0 and 1.

Given a particular value of XX, max(X,Y)max(X, Y) must be greater than or equal to XX by definition.

max-example

Similarly, defining sqrt(X)sqrt(X) as the unique positive square root must be greater than or equal to XX due to the value of XX being in the interval [0,1][0, 1]. For example, 0.25=0.5\sqrt{0.25}=0.5, 0.50.707\sqrt{0.5} \approx 0.707.

sqrt-example

As well as being monotonic (increasing) functions, both f(x)=x2f(x) = x^2 and f(x)=sqrt(x)=x1/2f(x) = sqrt(x) = x^{1/2} are in fact continuous, bijective functions on the interval [0,1][0, 1]. This means given any value y[0,1]y \in [0, 1], there exists a unique value x[0,1]x \in [0, 1] such that f(x)=yf(x) = y.

square-sqrt-graph

Probability distribution

What's the probability that max(X,Y)max(X, Y) is less than or equal to a given value α[0,1]α \in [0, 1]?

Recall that XX and YY are random variables with the standard uniform distribution so their cumulative distribution function is simply the width of the interval within [0,1][0, 1]:

P(Xα)=P(Yα)=α01=αP(X \leq α) = P(Y \leq α) = \frac{α - 0}{1} = α

We also know that XX and YY are independent so:

P(max(X,Y)α)=P(XαYα)=P(Xα)P(Yα)=αα=α2\begin{align*} P(max(X, Y) \leq α) &= P(X \leq α \cap Y \leq α) \\ &= P(X \leq α)P(Y \leq α) \\ &= α * α \\ &= α^2 \end{align*}

What's the probability that sqrt(X)sqrt(X) is less than or equal to a given value α[0,1]α \in [0, 1]?

From our first observations we noticed that squaring is a monotonic increasing function on [0,1][0, 1] which means that it preserves an inequality when applied to both sides.

P(sqrt(X)α)=P(Xα2)=α2\begin{align*} P(sqrt(X) \leq α) &= P(X \leq α^2) \\ &= α^2 \end{align*}

Their cumulative distribution functions are equal everywhere and therefore max(X,Y)max(X, Y) and sqrt(X)sqrt(X) are identically distributed!

Transforming the problem

We can transform the problem to view the intuition behind this visually. As we have two random independent variables, rather than a random point on a 1D line imagine XX and YY as a random point on a 2D grid where each point is equally likely to be picked.

2d-point

In this 2D grid, we see that the boundary lines X=αX = α and Y=αY = α define a square such that any point inside this square satisfies max(X,Y)αmax(X, Y) \leq α.

2d-max

As each point is equally likely to be picked, the probability of picking a point inside the square is proportional to the area of the grid and we get the same result:

P(max(X,Y)α)=area(square)area(wholegrid)=α212=α2\begin{align*} P(max(X, Y) \leq α) &= \frac{area(square)}{area(whole grid)} \\ &= \frac{α^2}{1^2} \\ &= α^2 \end{align*}

For sqrt(X)sqrt(X), we extend the 1D line to a 2D grid where each value sqrt(X)=αsqrt(X) = α defines a unique square XX in the 2D grid where the value of XX is the area of the square X=α2X = α^2 and vice versa.

2d-sqrt

From this we can visually see the same inequality as before:

P(sqrt(X)α)=P(Xα2)=α2\begin{align*} P(sqrt(X) \leq α) &= P(X \leq α^2) \\ &= α^2 \end{align*}

Conclusion

The result is hard to believe at first. How could max(X,Y)max(X, Y) and sqrt(X)sqrt(X) possibly be identically distributed? With this magic setup these two random functions can in fact be linked together where introducing the second independent identically distributed random variable YY to XX has an effect of introducing "squaring" to the problem.

We also saw that transforming the problem to visualise the problem in a new way can help provide intuition into how a structure behaves and aid constructing a proof. This powerful technique pops up often in mathematics and can remarkably link one problem to another, sometimes in a completely different branch of mathematics!