max(X, Y) or sqrt(X)?
Intro
I originally learnt this from Stand-up Maths who creates excellent entertaining videos about fun results like this one. Suppose you have a random number generator that returns numbers between 0 and 1 with equal probability. Would you rather pick the maximum of two random numbers or the (positive) square root of one random number if you want the largest value?
In more formal mathematical terms, let and be two independent random variables with the standard uniform distribution.
How do the distributions of and relate to each other?
First observations
What do and do at a basic level? Imagine as a random point on a line between 0 and 1.
Given a particular value of , must be greater than or equal to by definition.
Similarly, defining as the unique positive square root must be greater than or equal to due to the value of being in the interval . For example, , .
As well as being monotonic (increasing) functions, both and are in fact continuous, bijective functions on the interval . This means given any value , there exists a unique value such that .
Probability distribution
What's the probability that is less than or equal to a given value ?
Recall that and are random variables with the standard uniform distribution so their cumulative distribution function is simply the width of the interval within :
We also know that and are independent so:
What's the probability that is less than or equal to a given value ?
From our first observations we noticed that squaring is a monotonic increasing function on which means that it preserves an inequality when applied to both sides.
Their cumulative distribution functions are equal everywhere and therefore and are identically distributed!
Transforming the problem
We can transform the problem to view the intuition behind this visually. As we have two random independent variables, rather than a random point on a 1D line imagine and as a random point on a 2D grid where each point is equally likely to be picked.
In this 2D grid, we see that the boundary lines and define a square such that any point inside this square satisfies .
As each point is equally likely to be picked, the probability of picking a point inside the square is proportional to the area of the grid and we get the same result:
For , we extend the 1D line to a 2D grid where each value defines a unique square in the 2D grid where the value of is the area of the square and vice versa.
From this we can visually see the same inequality as before:
Conclusion
The result is hard to believe at first. How could and possibly be identically distributed? With this magic setup these two random functions can in fact be linked together where introducing the second independent identically distributed random variable to has an effect of introducing "squaring" to the problem.
We also saw that transforming the problem to visualise the problem in a new way can help provide intuition into how a structure behaves and aid constructing a proof. This powerful technique pops up often in mathematics and can remarkably link one problem to another, sometimes in a completely different branch of mathematics!