set.seed(42)12 Generate Random Variables
Reference: Rizzo (2019) Chapter 3
12.1 Remember to Set the Random Seed
If the seed is not set by a default value, each time I render the webpage, the R codes will run, and different results may given. In some documents, the text will require to change in order to match the code output. In your project, set it whatever you like, 42, 2026, 3306, 3389, 65535, 114514, …
12.2 Direct approach in R
# Uniform (0,1)
runif(n = 10, min = 0, max = 1) [1] 0.9148060 0.9370754 0.2861395 0.8304476 0.6417455 0.5190959 0.7365883
[8] 0.1346666 0.6569923 0.7050648
# Bin(30, 0.6)
rbinom(n = 10, size = 30, prob = 0.6) [1] 18 16 14 20 18 14 13 21 18 18
# Poisson(7)
rpois(n = 5, lambda = 7)[1] 11 4 14 12 4
# Beta(5,3)
rbeta(n = 10, shape1 = 5, shape2 = 3) [1] 0.6180147 0.3383139 0.4158620 0.4378221 0.5261561 0.4187009 0.7703246
[8] 0.5679070 0.6560498 0.5524067
12.3 Inverse CDF
For any continuous random variable, the random variable defined by its Cumulative Distribution Function (CDF) evaluated at itself follows a U(0,1) distribution.
Thus you can get the random value from the inverse of CDF.
12.3.1 Example: Exp(2)
\[f(x) = 2e^{-2x} \quad F(x) = 1 - e^{-2x}\]
Thus, \(U = F(x)\) we have \(x = -\frac{1}{2} \log{(1-U)}\)
- Generate \(U \sim Uniform(0,1)\) (10,000) uniform random numbers
- Compute \(X = -\frac{1}{2} \log{U}\)
m <- 100000
U <- runif(m)
X <- -1/2 * log(U)
hist(X, freq = F)
12.3.2 Example: Beta(5,3)
\[f(x) = \frac{1}{Beta(5,3)} x^4 (1-x)^2\]
ReverseCDF <- function(N) {
return(qbeta(runif(N),5,3))
}
b0 <- ReverseCDF(50)
hist(b0, freq = F, ylim = c(0, 2.8))
curve(dbeta(x, 5, 3), add = T)
12.4 Acceptance-Rejection Method
You want to generate \(X\sim f_X\) but it’s difficult to do directly.
Assume you have \(Y\sim g_Y\) which is easy to generate, and \[ \frac{f(t)}{g(t)}\leq c, \forall t \text{ s.t. } f(t)>0 \]
Then generate \(u\) from \(U \sim U(0,1)\) and \(y\) from \(Y\sim g_Y\), accept \(y\) if \(u < \frac{f(t)}{c g(t)}\), otherwise drop it.
In some textbooks, \(M = sup \frac{f(t)}{g(t)} < \infty \Rightarrow \frac{f(t)}{Mg(t)}\leq 1\). The \(M\) here is equal to the \(c\) mentioned earlier.
12.4.1 Example: Beta(5,3)
\[f(x) = \frac{1}{Beta(5,3)} x^4 (1-x)^2\]
\[f'(x) = \frac{1}{Beta(5,3)} [4x^3 (1-x)^2 - 2x^4 (1-x)] = \frac{2x^3(1-x)(2-3x)}{Beta(5,3)} \]
Then we have \(M = sup \frac{f(t)}{g(t)} = f(\frac{2}{3})\)
b1 <- rbeta(50, 5,3)
hist(b1, freq = F, ylim = c(0, 2.8))
curve(dbeta(x, 5, 3), add = T)
b2 <- c()
M <- dbeta(2/3, 5,3)
while (length(b2) < 50){
u1 <- runif(1)
y <- runif(1)
if(u1 < dbeta(y,5,3)/M){
b2 <- c(b2,y)
}
}
hist(b2, freq = F, ylim = c(0, 2.8))
curve(dbeta(x, 5, 3), add = T)
There’re some performance weakness: - bad append operation b2 <- c(b2,y) - non-vectorized operation
Let’s optimize!
N <- 50
M <- dbeta(2/3, 5,3)
N1 <- as.integer(N * M * 1.2)
u <- runif(N1)
y <- runif(N1)
b3 <- y[u < dbeta(y,5,3)/M]
print(paste(c("target is", N, ", and we generated", N1, "finally get", length(b3)), collapse = " "))[1] "target is 50 , and we generated 138 finally get 54"
b3 <- b3[1:N]
hist(b3, freq = F, ylim = c(0, 2.8))
curve(dbeta(x, 5, 3), add = T)
12.4.2 Example: Beta(2,2)
M <- dbeta(0.5,2,2)
mb <- as.integer(m/M * 1.2)
U <- runif(mb)
Y <- runif(mb)
X <- Y[U < dbeta(Y,2,2)/M]
hist(X, freq=F)
curve(dbeta(x,2,2), add = T)
12.5 Transformation Method
iid series \(X_i \sim \Gamma(\alpha_i,\lambda) \\ \implies \frac{\sum_{k=1}^{i}X_k}{\sum_{k=1}^{i+1}X_k}\sim Beta(\sum_{k=1}^{i}\alpha_k,\alpha_{i+1})\) and \(\sum_{k=1}^{n}X_k \sim \Gamma(\sum_{i=1}^{n} \alpha_i,\lambda)\) independent
\[ U,V \stackrel{iid}{\sim} U(0,1) \to \left\{ \begin{array}{cl} Z_1 & = \sqrt{-2\log U}cos{2\pi V} \\ Z_2 & = \sqrt{-2\log V}cos{2\pi U} \end{array}\right., Z_1, Z_2 \stackrel{iid}{\sim} N(0,1) \]
m <- 100000
U1 <- runif(m)
U2 <- runif(m)
X1 <- sqrt(-2*log(U1)) * cos(2 * pi * U2)
X2 <- sqrt(-2*log(U1)) * sin(2 * pi * U2)
cor(X1, X2)[1] 0.00217154
par(mfrow = c(1,2))
hist(X1)
hist(X2)
par(mfrow = c(1, 1))12.6 Benchmark
library(microbenchmark)
rejection_scalar <- function(N) {
b2 <- numeric(N)
M <- dbeta(2/3, 5, 3)
count <- 0
while (count < N) {
u1 <- runif(1)
y <- runif(1)
if (u1 < dbeta(y, 5, 3) / M) {
count <- count + 1
b2[count] <- y
}
}
return(b2)
}
rejection_vectorized <- function(N) {
M <- dbeta(2/3, 5,3)
N1 <- as.integer(N * M * 1.2)
u <- runif(N1)
y <- runif(N1)
b3 <- y[u < dbeta(y,5,3)/M]
return(b3[1:N])
}
results <- microbenchmark(
Reverse = ReverseCDF(5000),
Scalar_Loop = rejection_scalar(5000),
Vectorized = rejection_vectorized(5000),
Built_in = rbeta(5000, 5, 3),
times = 100
)Warning in microbenchmark(Reverse = ReverseCDF(5000), Scalar_Loop =
rejection_scalar(5000), : less accurate nanosecond times to avoid potential
integer overflows
resultsUnit: microseconds
expr min lq mean median uq max
Reverse 1993.010 2045.244 2087.0689 2058.774 2101.9675 2648.764
Scalar_Loop 11824.605 12287.864 12740.0891 12699.258 13142.7755 15216.863
Vectorized 719.509 730.374 763.4089 739.804 752.0835 2280.748
Built_in 247.476 252.109 259.1065 254.979 257.5415 380.111
neval
100
100
100
100