Monte Carlo Approximation Methods: Which one should you choose and when? (2024)

Inverse Transform Sampling, as its name suggests, uses the inverse cumulative distribution function (CDF) of a target distribution to generate random numbers that follow a desired distribution. The basic idea is:

Generate a Uniform Random Number: We draw a number U from a uniform distribution between 0 and 1.
Apply the Inverse CDF: Using the inverse of the target distribution’s CDF, transform U into a sample that follows the target distribution.

Here’s a quick illustration of how the samples (blue) are drawn from the distribution (red):

Monte Carlo Approximation Methods: Which one should you choose and when? (1)

Inverse CDF is a computationally simple and generalizable method to sample from distributions for which we know the CDF, such as the normal, exponential, gamma or beta distribution.

PDF, CDF, and Inverse CDF

Monte Carlo Approximation Methods: Which one should you choose and when? (2)

In intuitive terms, the CDF is the cumulative value of the PDF, which is equal to the integral of the PDF; then we take the inverse of the CDF function to get the final inverse CDF used for this method.

Formally, if a is a random variable, then the CDF of a is given by:

A CDF F has the following key properties:

F is continuous
F is non-decreasing
F has range 0 ≤ cdf(a) ≤ 1 for all a ∈ R

How does the inverse CDF algorithm work?

The algorithm consists of the following ingredients:

Input:

U: U is a uniform random variable drawn between 0 and 1.
This serves as the input probability for the inverse CDF and is what will be transformed into a sample from the desired distribution.

Parameter:

F: the CDF of the target distribution.
With F, we can simply compute its inverse, F^-1, and use it to map an input value to the desired domain

Output:

x: a random sample drawn from the target distribution.
This is generated by applying the inverse CDF to the random number from the uniform distribution (input).

Python Implementation

Now, let’s implement this method from scratch. We will use the exponential function since it will be easy to visualize our samples drawn by inverse CDF and compare them to the exact distribution:

By standard calculus integration techniques, we find that the target CDF F(x) is:

The inverse of this CDF is:

Requirements for the inverse CDF algorithm to work

The inverse CDF method makes a key assumption:

CDF F is invertible: The CDF F must be invertible, meaning that each input to F must have a unique output

This constraint rules out a number of functions. For example, below are a few function types that are common but non-invertible (and thus would not work with inverse CDF):

Constant functions: Any constant function in the form of f(x) = c where c is a constant is not invertible since every input maps to the same output and thus the function is not one-to-one.

Monte Carlo Approximation Methods: Which one should you choose and when? (4)

2. Certain quadratic functions: For example f(x) = x^2 is non-invertible since it’s many-to-one (consider f(x) = 1, x could be 1 or -1).

Monte Carlo Approximation Methods: Which one should you choose and when? (5)

3. Certain trigonometric functions: For example f(x) = sin(x) is not invertible over their entire domain since they are periodic, although with restricted domains they may be made invertible.

Monte Carlo Approximation Methods: Which one should you choose and when? (6)

Why does inverse CDF work?

The key idea is that a random variable uniformly distributed between 0 and 1 can be transformed into a random variable with a certain distribution by applying the inverse of the target distribution’s CDF, which is assumed to be known and tractable.

Advantages

Algorithmic simplicity: it’s very easy to implement with lower-dimensional data, thus having a wide application area across different fields and tasks.
Sample accuracy: assuming the CDF and its inversion represents the exact target distribution, the method yields a relatively high exactness compared to other methods such as MCMC which we will see soon.

Disadvantages

Computational complexity: For some distributions, the inverse CDF may not have a closed-form expression, making computation challenging or expensive.
Difficulty with high dimensionality: It can be difficult to apply in high-dimensional spaces, especially with dependencies between variables.
Invertibility constraint: Anytime a CDF is non-invertible, this method becomes invalid. This excludes a number of functions as we saw above.
Limited to known distributions: Inverse CDF requires the exact form of the CDF, limiting its application to known distributions only.

Taking all these limitations into account, there are only a few categories of distributions that we can apply inverse CDF to. In reality, with Big Data and unknown distributions, this method can quickly become unavailable.

With these advantages and disadvantages in mind, let’s now look at another random sampling framework that tackles these limitations: Markov Chain Monte Carlo (MCMC).

As we saw just now, the inverse CDF transformation method is highly limited, especially with high dimensional sample spaces. Markov Chain Monte Carlo (MCMC), on the other hand, scales well with dimensionality, enabling us to sample from a much larger family of distributions.

How does the Metropolis-Hastings algorithm work?

In intuitive terms, the algorithm works in the following steps: Similar to inverse CDF, we have a target distribution that we’re sampling from. However, we need an additional ingredient: the current state z*, and q(z|z*) depends on z*, creating a Markov chain with samples z¹, z², z³,. Each sample is accepted into the chain only if it satisfies a certain criterion, which we will define below since this criterion differs across variations of the algorithm.

Let’s formalize it into a more algorithmic structure.

The algorithm runs in cycles, and each cycle follows these steps:

Generate a sample z* from the proposal distribution.
Accept the sample with probability Then we will accept this value with an acceptance probability, which in Metropolis-Hastings is defined as:

Monte Carlo Approximation Methods: Which one should you choose and when? (8)

where

z* is the current state.
z^T is the proposed new state.
p(z*) is the probability of state z* according to the desired distribution.
p(z^T) is the probability of state z^T according to the desired distribution.

The logic behind this acceptance threshold is that it ensures that the more probable states (according to the desired distribution) are visited more often.

Now, this is the most generalized version of the algorithm; if the proposal distribution is known to be symmetric, meaning that the probability of proposing a move from state x to x′ is the same as proposing a move from x′ to x, i.e. q(x′|x) = q(x|x′), then we can use a special case of Metropolis-Hastings that requires a simpler acceptance threshold.

Metropolis algorithm for Symmetric Distributions

This is a specific MCMC algorithm that we choose to use if the proposal distribution is symmetric, i.e. q(z⁰ | z¹) = q(z¹ | z⁰) for all values of 1 and 0, interpreted as “the probability of transitioning from any state A to state B is equal to the probability of transitioning from B to A”. So, each step of the algorithm becomes:

Generate a sample z* from the proposal distribution.
Accept the sample with probability:

Monte Carlo Approximation Methods: Which one should you choose and when? (9)

Metropolis-Hastings and Metropolis Algorithms

Let’s look at the algorithms side by side. As we saw before, the only difference is the acceptance threshold; all other steps of the algorithms run identically:

Monte Carlo Approximation Methods: Which one should you choose and when? (10)

Advantages

Convergence to Equilibrium Distribution: In certain cases, the random walk can converge to a desired equilibrium distribution although it likely takes a long time in high-dimensional spaces.
Low Computational Cost: Random walk often requires fewer computational resources compared to other complex sampling methods, making it suitable for problems where computational efficiency is a priority.
Versatility of application: Due to its high similarity to naturally occurring patterns, Random Walk has applications in a wide variety of fields:
• Physics: Brownian motion of molecules in liquids and gases.
• Network Analysis
• Financial Markets: to model stock price movements
• Population Genetics

Disadvantages:

Sensitive to initialization: The algorithm’s performance can be sensitive to the choice of the starting values, especially if the initialized values are far from the high-density areas.
Locality traps: Depending on the complexity of the proposal distribution, the algorithm could get stuck in local optima and have difficulty traversing to other areas along the distribution.

Now, keeping the Metropolis-Hastings algorithm in mind, let’s look at another special instance of it: Gibbs Sampling.

Gibbs Sampling is a special instance of Metropolis-Hastings where each step is always accepted. Let’s first look at the Gibbs sampling algorithm itself.

How does the Gibbs algorithm work?

The idea is relatively simple and is best illustrated by first zooming in on a micro example involving sampling from a distribution p(z1, z2, z3) over 3 variables. The algorithm would run in the following steps:

At timestep T, initialize the starting values to:

Monte Carlo Approximation Methods: Which one should you choose and when? (11)

2. Draw the new value for z1:

Monte Carlo Approximation Methods: Which one should you choose and when? (12)

3. Draw a new value for the second position, z2 from the conditional:

Monte Carlo Approximation Methods: Which one should you choose and when? (13)

4. Finally draw a new value for the last position, z3:

Monte Carlo Approximation Methods: Which one should you choose and when? (14)

5. Repeat this process, cycling through the three variables z1…z3 until it reaches a certain satisfactory threshold.

Generalized algorithm

Formally, the algorithm is represented by first initializing the starting positions, and then taking T consecutive steps

Monte Carlo Approximation Methods: Which one should you choose and when? (15)

Conditions for Gibbs to sample from a target distribution correctly

Invariance. The target distribution p(z) is invariant of each Gibbs step, and therefore p(z) is invariant of the entire Markov chain.
Ergodicity. If the conditional distributions are all non-zero, then ergodicity is implied since any point in z space is reachable in a finite number of steps.
Sufficient burn-in. As we saw with any method that requires random initialization, the first few samples are dependent on the initialization, and the dependence weakens after many iterations.

How does this relate to Metropolis-Hastings?

In Metropolis-Hastings, we defined the acceptance threshold to be:

Monte Carlo Approximation Methods: Which one should you choose and when? (16)

Thus, the Metropolis-Hastings proposal steps are always accepted, as we saw in the Gibbs algorithm.

Variations

Since the Gibbs method updates one variable at a time, there are strong dependencies between consecutive samples. To overcome this, we could use an intermediate strategy to sample from groups of variables instead of individual variables, known as blocking Gibbs.

Similarly, by the nature of Markov chains, the examples drawn successively will be correlated. To generate independent samples, we could use sub-sampling within the sequence.

Now that we’ve walked through how each algorithm works and its application areas, let’s summarize the defining characteristic of each method.

1. Inverse Transformation Sampling

Data Size: Best for moderate-sized datasets.
Time: Generally efficient for univariate distributions.
Data Complexity: Use for simple distributions where the cumulative distribution function (CDF) and its inverse are known and easy to compute.
Consider avoiding if: Sampling high-dimensional variables/distributions.
Biggest Advantage: High accuracy if the CDF accurately reflects the target distribution.
Requirement: The CDF must be known and invertible.

2. Metropolis-Hastings (MCMC)

Data Size: Scalable and suitable for large datasets.
Time: Can be computationally intensive, depending on the complexity of the target distribution.
Data Complexity: Ideal for complex or multi-modal distributions.
Biggest Advantages:
- Can sample from a distribution without knowing its normalization constant (the full form)
- Great for exploring the global structure of a distribution and guarantees convergence
Disadvantage: May suffer from very slow convergence with
- complex or multimodal target distribution, since the algorithm may be stuck in local modes and have difficulty transitioning between them;
- the variables are highly correlated;
- high dimensional spaces;
- poor initial values or step size choices

3. Gibbs Sampling

Data Size: Suitable for both small and large datasets.
Time: Often more efficient than Random Walk Metropolis-Hastings since it doesn’t require acceptance/rejection steps.
Data Complexity: Best used when dealing with high-dimensional distributions where you can sample from the conditional distribution of each variable.
Biggest Advantages:
- Can easily compute conditional distributions;
- Less prone to local minima traps compared to Random Walk.
Requirements:
- Markov chain ergodicity
- The full conditional distributions must be known and tractable

In summary:

Thanks for sticking with me this far! In this article, we looked at 3 key approximation sampling methods: Inverse CDF, Metropolis Hastings MCMC, and Gibbs Sampling MCMC. We explored how each algorithm functions, their respective advantages and disadvantages, and typical use-cases.

Inverse CDF provides a straightforward method to sample from a known distribution when its CDF is invertible. It’s computationally efficient but is less suitable for high-dimensional or complex distributions.
Metropolis Hastings MCMC offers a more general approach that allows for sampling from distributions that are difficult to tackle otherwise. However, it does require more computational resources and may be sensitive to tuning parameters like the proposal distribution.
Gibbs Sampling MCMC is specifically efficient when the joint distribution is complex but can be broken down into simpler conditional distributions. It’s widely used in machine learning, although it can be slow to converge and memory-intensive for high-dimensional problems.

Monte Carlo Approximation Methods: Which one should you choose and when? (2024)

FAQs

What are numerical methods in Monte Carlo? ›

Monte Carlo methods are numerical methods that use random numbers to compute quantities of interest. This is normally done by creating a random variable whose expected value is the desired quantity.

Learn More ›

What is the Monte Carlo method of sampling? ›

Monte Carlo is a computational technique based on constructing a random process for a problem and carrying out a NUMERICAL EXPERIMENT by N-fold sampling from a random sequence of numbers with a PRESCRIBED probability distribution.

Read On ›

What is the Monte Carlo method of algorithm? ›

Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be deterministic in principle.

Find Out More ›

What is Monte Carlo approximation in machine learning? ›

In machine learning, Monte Carlo methods provide the basis for resampling techniques like the bootstrap method for estimating a quantity, such as the accuracy of a model on a limited dataset. The bootstrap is a simple Monte Carlo technique to approximate the sampling distribution.

How to choose number of iterations for Monte Carlo simulation? ›

In most schedules 300-500 iterations will be correct optimal number of iterations. There are two cases where you would need to do more iterations: You have very rare events, which you would like to capture in your schedule risk analysis. For example, earthquake with probability 0.01% per duration of the project.

Learn More Now ›

What was the Monte Carlo method first used for? ›

The Monte Carlo Method was invented by John von Neumann and Stanislaw Ulam during World War II to improve decision making under uncertain conditions. It was named after a well-known casino town, called Monaco, since the element of chance is core to the modeling approach, similar to a game of roulette.

Why use Monte Carlo sampling? ›

Monte Carlo simulations are particularly useful when dealing with complex systems with high uncertainty or randomness. They are widely applied in various fields, such as finance, engineering, physics, economics, and risk analysis, among others.

What is the difference between random sampling and Monte Carlo method? ›

Purpose: Monte Carlo simulation is used to estimate the expected value of a function, whereas traditional random sampling is used to estimate the characteristics of a population.

Know More ›

Why is the Monte Carlo method so important today? ›

The ever increasing complexity of data (“big data”) requires radically different statistical models and analysis techniques from those that were used 20–100 years ago. By using Monte Carlo techniques, the statistician is no longer restricted to use basic (and often inap- propriate) models to describe data.

How many Monte Carlo simulations are enough? ›

DCS recommends running 5000 to 20,000 simulations when analyzing a model. Here is why: Statistics are estimates of the parameters of a population. 3DCS results are statistics based on a sample (the number of simulations run) of an infinite population (the number of simulations that could be run).

See Details ›

What is Monte Carlo known for? ›

Monte Carlo is situated on a prominent escarpment at the base of the Maritime Alps along the French Riviera. Near the quarter's western end is the "world-famous Place du Casino, the gambling center ... that has made Monte Carlo an international byword for the extravagant display and reckless dispersal of wealth".

Tell Me More ›

What are the Monte Carlo methods of machine learning? ›

Monte Carlo (MC) methods are a subset of computational algorithms that use the process of repeated random sampling to make numerical estimations of unknown parameters. They allow for the modeling of complex situations where many random variables are involved, and assessing the impact of risk.

Find Out More ›

What is Monte Carlo approximation expectation? ›

Monte Carlo methods are a set of computational techniques for 1) generating samples from a target distribution; 2) approximating the expectations of some random quantities under this distribution.

Read The Full Story ›

What is the Monte Carlo likelihood approximation? ›

The method of Monte Carlo likelihood approximation (MCLA) approximates the entire likelihood function using random effects simulated from an importance sampling distribution. The resulting Monte Carlo likelihood approximation can be used for any frequentist likelihood-based inference.

Get More Info Here ›

What is a numerical example of Monte Carlo simulation? ›

One simple example of a Monte Carlo Simulation is to consider calculating the probability of rolling two standard dice. There are 36 combinations of dice rolls. Based on this, you can manually compute the probability of a particular outcome.

Explore More ›

What do you mean by numerical method? ›

Numerical methods are techniques that are used to approximate Mathematical procedures. We need approximations because we either cannot solve the procedure analytically or because the analytical method is intractable (an example is solving a set of a thousand simultaneous linear equations for a thousand unknowns).

See Details ›

What are examples of numerical methods? ›

Numerical Methods

ASA Theorem.
Absolute Convergence.
Absolute Value Equations and Inequalities.
Abstract algebra.
Addition and Multiplication of series.
Addition and Subtraction of Rational Expressions.
Addition, Subtraction, Multiplication and Division.
Algebra.

More items...

Discover More ›

Monte Carlo Approximation Methods: Which one should you choose and when? (2024)

PDF, CDF, and Inverse CDF

How does the inverse CDF algorithm work?

Python Implementation

Requirements for the inverse CDF algorithm to work

Why does inverse CDF work?

Advantages

Disadvantages

How does the Metropolis-Hastings algorithm work?

where

Metropolis algorithm for Symmetric Distributions

Metropolis-Hastings and Metropolis Algorithms

Advantages

Disadvantages:

How does the Gibbs algorithm work?

Generalized algorithm

Conditions for Gibbs to sample from a target distribution correctly

How does this relate to Metropolis-Hastings?

Variations

1. Inverse Transformation Sampling

2. Metropolis-Hastings (MCMC)

3. Gibbs Sampling

In summary:

FAQs

What are numerical methods in Monte Carlo? ›

What is Monte Carlo known for? ›

References