Q3 ’15: Only little change in overall economic conditions

Yesterday the DIW Berlin published their most recent short-term outlook in form of the DIW Economic Barometer, the institute’s flash indicator of the state of the German economy, for Q3 ’15. The indicator is continuously updated based on the most recent data, this time as of September 30. The overall indicator is a unitless index that is scaled to have unconditional mean of 100 (signalling normal circumstances), where smaller values signal below-average economic conditions and higher values mean above-average conditions. Historically, a value of 100 was on average associated with a q-o-q real GDP growth of 0.3%. The most recent value of the indicator is for Q3 is roughly 105 points, i.e. indicating conditions that are mildly above average.

DIW Economic Barometer with contributions from different sectors

This is also slightly higher than in the previous months, where the indicator for Q3 was 103.8 points. Still, the indicator is slightly down from its Q2 value of around 107. The team at DIW thus sees a good chance that Q3 growth might pick up again a bit climbing from 0.4 to 0.5%.

Considering the sectoral decomposition of the different contributions to the overall indicator, it becomes clear that primarily the service and labor market data (shaded in dark grey and grey) have been driving the positive trend in the indicator. This has been the recurrent theme over the course of the whole year basically. Only in the second quarter the industrial sector (green bars) chipped in an additional 1.1 index points to drive the index to its peak at around 107, but otherwise service and robust employment data were the main drivers.
With the latest quarterly value again estimated above the neutral level of 100 points, the Barometer continues a quite remarkable streak of above-average values. In fact, if one uses the interactive chart at the DIW site to zoom into the values that followed the crisis in 2009, it becomes clear that economic conditions have been above average ever since the second half of 2013.

DIW Economic Barometer with contributions from different sectors

Only in 2012, and very briefly in 2011, conditions worsened to en extent that they were below-average. Approaching the second half of 2013, the indicator entered the current streak of continuously positiive (i.e. above average) values. It therefore appears that despite some headwinds, e.g. in form of the economic weakening in China, the German economy is in a very robust shape with only little indication of a downward trend in the near future.

As always, eee here for the full discussion with interactive charts and more data (in German) at the DIW Berlin website. The construction of the index is documented here, FAQs are answered here.

Q3 ’15: DIW Economic Barometer slightly weaker

On August 26 my former colleagues at DIW Berlin published Q3 ’15 value of the institute’s flash indicator of the state of the German economy. The overall indicator is a unitless index that is scaled to have unconditional mean of 100 (signalling normal circumstances), where smaller values signal below-average economic conditions and higher values mean above-average conditions. Historically, a value of 100 was on average associated with a q-o-q real GDP growth of 0.3%.

DIW Economic Barometer with contributions from different sectors

With a current value of 103.8 the indicator is slightly down from its Q2 value of around 107, but still above its neutral level of 100. In the previous quarter the team at DIW correctly predicted that q-o-q growth would amount to 0.4% in Q2 ’15 and despite a slightly weaker index, they remain confident that Q3 growth won’t be much different.

While the quarterly index is the headline index of the Barometer, it is always worth to take a look at the monthly series. The Barometer is based on monthly data and in fact the primitive index behind the Barometer is the monthly index that is aggregated using the Mariano and Murasawa (2003) pattern to form the quarterly index. I like to look at the monthly index, because it often can provide a more timely indication of the current trend (upward, downward or sideways), even though it is much more erractic.

DIW Economic Barometer on monthly basis

If you consider the development over the past two years (beginning January 2013), the index began a gradualy hike towards a peak in January 2014 and subsequently followed a very mild downward trend. In 2015 the index has been less erratic than in the previous year, but it nonetheless appears to continue to gradually decrease. Note however that the neutral level of the index is at 33 and that it still remains above that level with around 35 points in August (the September prediction is roughly neutral with 33 point something).

See here for the full discussion with interactive charts and more data (in German) at the DIW Berlin website. The construction of the index is documented here, FAQs are answered here.

Gaussian processes for regression and classification

In my previous post on Gaussian process regressions, I described the intuition behind the function space view on GPs. In this post I want to continue illustrating how to use Gaussian processes to do regression and classification on a small example dataset. To train the model to the data I will use Stan. In case you haven’t heard about Stan or its R interface, it is definitely worth to have a look. I use Stan here instead of native R, because Stan makes it really, really simple to operationalize GP regressions and requires – quite frankly – minimal programming effort. First, this is because there is some example code in the manual that shows how to implement some basic models and the GitHub repo features more example models. Second, in Stan I only have to specify the probability model, but no further derivations of properties of the posterior are required. Even though there is a lot of tuning and differentiation going on, it’s all under the hood. Third, the standard posterior simulation algorithm of Stan (HMC or NUTS) is good at handling complex geometries of the posterior, i.e. non-convexities or multiple modes. This becomes particularly relevant when you want to go beyond the bivariate examples here and consider large feature spaces or you want to consider more complex hierarchical structures. Fourth, we can do full Bayesian inference, e.g. do posterior predictive checking.

Kernels revisited

In the previous post about Gaussian processes it already became clear that the mean and covariance function m(x) and k(x, x') of the Gaussian process are at the heart of building the model

y \sim GP(m(x), k(x, x')).

Nearly all texts that I found about GPs do not explicitly model the mean function m(x), but set it to zero. But I think this need not necessarily be the best choice for every application at hand. If you have substantive prior knowledge that makes you confident to postulate some non-zero prior relationship between mean of y and x, one should arguably include it in the prior. However, the subsection on linear kernels below will illustrate how a GP prior with m(x)=0 and linear kernel can be shown to represent a prior belief that the data comes from a linear model with a normal prior with mean zero for the parameters. Hence, the zero mean function induces shrinkage towards zero (to mitigate overfitting) like in a ridge regression.

Squared exponential kernel

I mentioned the squared exponential (SE) kernel already in the last post, which in case of a univariate feature vector is equivalent to set the ij-th element of the covariance matrix

k_{ij}(x, x') = \theta_1 + \theta_2 * \text{exp} \left(- \frac{1}{2 \ell^2} (x_i - x_j)^2 \right)

with (hyper-)parameters such that the resulting covariance matrix is always positive-definite. In the last post on GPs, I mentioned that one implication of such a prior is that (for given length scale \ell) values for y_i and y_j are less correlated the further their apart in terms of the distance between x_i and x_j. How do functions fron this prior distribution look like? We can use Stan to simulate processes from the GP prior. The Stan manual provides examples how to implement this, and I largely copy their code here:

stan_code <- '
// Sample from Gaussian Process
// Hyperparameters of the covariance functions are part of the data

data {
int<lower=1> N;
real x[N];
real<lower=0> ell;
real<lower=0> theta1;
real<lower=0> theta2;
}
transformed data {
matrix[N,N] L;
vector[N] mu;
cov_matrix[N] Sigma;
mu <- rep_vector(0, N);

for(i in 1:N)
for(j in i:N){
Sigma[i, j] <- theta1 + theta2 * exp(-(1/(2 * pow(ell,2))) * pow(x[i] - x[j], 2));
}

for(i in 1:(N-1))
for(j in (i+1):N){
Sigma[j, i] <- Sigma[i, j];
}

L <- cholesky_decompose(Sigma);
}
parameters {
vector[N] z;
}
model {
z ~ normal(0,1);
}
generated quantities {
vector[N] y;
y <- mu + L * z;
}'
library(rstan)

stan_data <- list(N=100,
x=1:100,
ell=1,
theta1=1,
theta2=1)

model <- stan(model_code = stan_code,
data=stan_data,
iter=0, chains=0, verbose =FALSE)

which we can run from R using rstan

ells <- c(0.000095, 1.1, 3.1)
offset <- c(-10, 0, 10)

for (i in 1:length(ells)) {
sims theta1 = 1, theta2 = 5), iter = 50, chains = 1, seed = c(123), verbose = FALSE)
tmp if (i == 1) {
plot(offset[i] + tmp$y[15, ], type = "l", col = i, lwd = 2, ylim = c(-15,
15), ylab = "y", xlab = "x", sub = "Lines offset by -10, 0 and 10 for illustration")
legend("topleft", legend = c(paste("l = ", ells)), lwd = rep(2, 3),
col = 1:3)
} else {
lines(offset[i] + tmp$y[15, ], type = "l", col = i, lwd = 2)
}
}

plot of chunk simulate1

The plots confirm a simple intuition: the values of y become more and more correlated across different values of x, the higher the length scale \ell, because the term inside the exponential is pulled towards zero. Another view on \ell that explains its name is that it scales the x-axis. Imagine two points x_i and x_j with Euclidean distance d. The term inside the exponential is like - \frac{1}{2\ell^2} d^2. Hence, increasing \ell is just like decreasing the distance d, i.e. pulling points closer together, which we can be understood as a compression of the x-axis. Another view on \ell is through the lens of variable or feature selection. In a model with more than a single explanatory x_i each of which has its own individual length scale \ell_i, we can do feature selection through the individual length scales. To see this, note that for \ell_i \rightarrow \infty the influence of variable i in the kernel vanishes and y is essentially constant in that variable.

Linear kernel

The linear kernel has the nice property that it can be derived from a linear model. Doing so is instructive, because it establishes an example of how kernels can be elicited based on features of the data. For this suppose we model the relationship between x and y by a set of basis functions \phi(x), e.g. \phi(x) = [1, x, x^2, x^3, \ldots]' would be such a set of basis functions of x that describe a polyniomial regression. Hence,

y_i = \phi(x_i) ' \theta + \epsilon_i

is still a linear model, in the sense that it is linear in the parameters \theta. We may write this more compactly

y = Z \theta + \epsilon

where Z = [\phi(x_1)', \ldots, \phi(x_N)']' is the matrix of regressors and \epsilon is a conformable vector if iid noise with mean zero and variance \sigma^2_e. If we impose a normal prior \theta \sim \text{Normal}(0, \sigma_b^2 I_n) where the mean is a vector of zeros and I_n denotes the n-dimensional identity matrix (n the number of parameters), it turns our that y is also normal with zero mean

\mathbb{E}[y] = \mathbb{E}[Z \theta] = 0

and covariance

\mathbb{E}[y y'] = \mathbb{E}[(Z \theta + \epsilon) (Z \theta + \epsilon)'] = \sigma_b^2 Z Z' + \sigma^2_e I_N, i.e. y \sim \text{Normal}(0, \sigma_b^2 Z Z' + \sigma^2_e I_N).

This means that for any finite number of points [x_1, \ldots, X_N] the joint distribution of the elements in y is Gaussian: this was the defining property of a Gaussian process stated in the previous post. We conclude that

m(x_i) = 0

and

k(x, x') = \sigma^2_b Z Z' + \sigma_e^2 I_N.

The simplest special case is given by a single feature vector x_i and \phi(x_i) = x_i. In that case

K_{i,j} = \sigma^2_b x_i x_j + \sigma_e^2 \mathbf{1}_{\{i=j\}}.

Note that if we want to include an intercept in the linear regression, we can do this either by inlcuding a columns of ones in Z or by setting

K_{i,j} = \sigma^2_b (x_i - c) (x_j - c) + \sigma_e^2 \mathbf{1}_{\{i=j\}}.

In the latter case, the parameter c gives the x-coordinate at which the prior function will be zero (plus noise if we include it – as we do here).

One can easily simulate such a model to get an idea how our prior looks

stan_code <- '
// Sample from Gaussian Process
// Hyperparameters of the covariance functions are part of the data

data {
int<lower=1> N;
real x[N];
real<lower=0> sigma_b;
real<lower=0> sigma_e;
real c;
}
transformed data {
matrix[N,N] L;
vector[N] mu;
cov_matrix[N] Sigma;
mu <- rep_vector(0, N);

for(i in 1:N)
for(j in i:N){
Sigma[i, j] <- sigma_b * (x[i] - c) * (x[j] - c) + if_else(i==j, sigma_e, 0.0);
}

for(i in 1:(N-1))
for(j in (i+1):N){
Sigma[j, i] <- Sigma[i, j];
}

L <- cholesky_decompose(Sigma);
}
parameters {
vector[N] z;
}
model {
z ~ normal(0,1);
}
generated quantities {
vector[N] y;
y <- mu + L * z;
}'
library(rstan)

stan_data <- list(N=100,
x=1:100,
sigma_e=1,
sigma_b=1,
c=16)

model <- stan(model_code = stan_code,
data=stan_data,
iter=0, chains=0, verbose =FALSE)
sigmas <- c(0.02, 0.15, 0.5)

for (i in 1:length(sigmas)) {
sims <- stan(fit = model, data = list(N = 100, x = 1:100, sigma_b = sigmas[i],
sigma_e = 1, c = 16), iter = 50, chains = 1, seed = c(123), verbose = FALSE)
tmp <- extract(sims, pars = "y")
if (i == 1) {
plot(tmp$y[15, ], type = "l", col = i, lwd = 2, ylim = c(-50, 50), ylab = "y",
xlab = "x")
legend("topleft", legend = c(paste("sigma_b = ", sigmas)), lwd = rep(2,
3), col = 1:3)
} else {
lines(tmp$y[15, ], type = "l", col = i, lwd = 2)
}
}

plot of chunk gp-in-action-simulate2

The plots show that our prior functional form is indeed linear with gradual fluctuations around this linear trend. The amplitude of these fluctuations is governed by the error variance, the higher the more erratic is the prior. Likewise, the smaller the prior variance of the coefficients, the more likely we think the linear trend is actually just flat. This shows that even with mean function set to zero, we may still model trending data solely in terms of the covariance function. The sampled lines cross the x-axis around the value c=16, as described above. Since we add noise to the observations, the sampled paths are not exactly zero at this point. However, apart from the noise there is no uncertainty at this points, i.e. the prior variance is minimal at this point and gradually increases as we move away from it.

Kernels, kernels, kernels…

There are many, many more kernels that have been proposed in different places. I found this little website by David Duvenaud quite helpful (make sure you also check out the chapter of his thesis that is linked on his page) to get (a) an overview over several prominent kernels and (b) an idea how to combine them to yields ever more flexible structures.

Univariate Gaussian process regression: icecream

In order to illustrate how to operationalize GP regressions, I decided to pick up on a little demonstration on Markus Gesmann's blog again. Hence, I wrap my discussion around the icecream dataset that Markus used to illustrate some general features of generalized linear models.

icecream <- data.frame(temp = c(11.9, 14.2, 15.2, 16.4, 17.2, 18.1, 18.5, 19.4,
22.1, 22.6, 23.4, 25.1), units = c(185L, 215L, 332L, 325L, 408L, 421L, 406L,
412L, 522L, 445L, 544L, 614L))
basicPlot <- function(...) {
plot(units ~ temp, data = icecream, bty = "n", lwd = 2, main = "Number of ice creams sold",
col = "#00526D", xlab = "Temperature (Celsius)", ylab = "Units sold",
...)
axis(side = 1, col = "grey")
axis(side = 2, col = "grey")
}
basicPlot()

plot of chunk gp-in-action-dataPlot

Markus has a very nice demonstration of how to gradually build a model that fits the data and its properties. From the simple scatter plot we can see already that when we wish to start modelling the data by means of a GP regression, there is little hope we can employ a simple SE kernel. Instead of being characterized by gradual up- and down swings, there is a pretty obvious trend in the data and I thus begin with a linear kernel.

LIN_kernel <- '
data {
int<lower=1> N1;
int<lower=1> N2;
vector[N1 + N2] x;
int<lower=0> y1[N1];
}
transformed data {
int<lower=1> N; // Total length of the feature vector
vector[N1 + N2] mu; // Prior mean of the process

N <- N1 + N2;
mu <- rep_vector(0, N);
}
parameters {
vector[N2] y2;
real<lower=0> beta1;
real<lower=0> beta2;
real<lower=0> beta3;
}
model {
matrix[N, N] Sigma;
vector[N] y;

for (n in 1:N1) y[n] <- y1[n];
for (n in 1:N2) y[N1 + n] <- y2[n];

for(i in 1:N)
for(j in i:N){
Sigma[i, j] <- beta1 * (x[i] - beta2) * (x[j] - beta2) + if_else(i==j, beta3, 0.0);
}

for(i in 1:(N-1))
for(j in (i+1):N){
Sigma[j, i] <- Sigma[i, j];
}

beta1 ~ cauchy(0,5);
beta2 ~ cauchy(0,5);
beta3 ~ cauchy(0,5);

y ~ multi_normal(mu, Sigma);
}'

x1 <- icecream$temp
N1 <- nrow(icecream)
idx1 <- (1):(N1)
x2 <- seq(0, 40, length.out = 40)
N2 <- length(x2)
idx2 <- (N1+1):(N1+N2)
x <- union(x1, x2)
library(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores()-1)

stan_data <- list(N1 = N1,
N2 = N2,
x = x,
y1 = icecream$unit
)

ice.fit <- stan(model_code = LIN_kernel,
data=stan_data,
iter=0, chains=0)

I should mention that I went with the default priors set in the some of the example models from the manual. However, the motivation for using the Cauchy prior is that in many settings the Cauchy with location zero and scale five is weakly informative, i.e. implied prior effect sizes broadly encompass any reasonable value with large probability – and some we consider unreasonable with still sufficient mass not to entirely preclude them. But I didn't really do any sanity check here, so to build a serious model we would of course have to go through some elaborate prior elicitation.

ice.res <- stan(fit = ice.fit, data = stan_data, iter = 800, chains = 2, seed = 123)

Note that I have only run 800 iterations on two chains. If you are familiar with other posterior simulation algorithms, e.g. Metropolis-Hastings, you might chuckle a bit, because these algorithms often require north of several thousand iterations to achieve convergence even on simple models. Plus, they are marred by large amounts of autocorrelation between samples from the posterior, which in turn makes a large number of iterations necessary. Using Stan 800 iterations is really enough for this simple model.

We can confirm this by looking at the summary statistics for the hyperparameters:

paras = c('beta1', 'beta2', 'beta3')
tmp.sum <- summary(ice.res, pars = paras)
tmp <- tmp.sum$summary[, "Rhat"]
tmp

## beta1 beta2 beta3
## 1.004882 1.042516 1.035404

The potential scale reduction factor Rhat indicates convergence for all parameters. Let us look at the autocorrelation:

test colnames(test) par(mfcol = c(1, 3))
sapply(1:3, function(i) acf(test[, i], main = colnames(test)[i]))

plot of chunk gp-in-action-acf

Pretty good and for just a few hundred iterations!

Hence, we can go on and plot the fit of the model together with out-of-sample predictions and some credibility region (in case you are re-running this on your computer, you certainly do not want to follow me here and explore the models using static graphs, but use ShinyStan instead). One nice thing about doing Bayesian inference here is that uncertainty surrounding our estimates (i) are actual posterior intervals (which many people wrongly equate with a confidence interval), (ii) do not have a rigid theoretical form that hinges on asymptotic approximations and (iii) price in the estimation uncertainty surrounding the unknowns in our model, e.g. the hyperparameters in the kernel.

library(dplyr)
ys ym ysd

post.var all.preds sd.data for (i in 1:nrow(icecream)) {
tmp sd.data[, i] }
all.bands 0.84, 0.975)))

plot.data lower = all.bands[2, ], higher = all.bands[3, ], highest = all.bands[4,
])
daten % tbl_df %>% arrange(temp)

plot(daten$temp, daten$sold, type = "l", main = "Number of ice creams sold",
col = "#00526D", xlab = "Temperature (Celsius)", ylab = "Units sold", sub = "Dashed lines show 68% and 95% posterior interval, solid line depicts posterior median",
ylim = c(0, 900))
abline(h = 0, v = 0)
points(icecream$temp, icecream$units)
lines(daten$temp, daten$lowest, lty = 2, col = "grey")
lines(daten$temp, daten$highest, lty = 2, col = "grey")
lines(daten$temp, daten$lower, lty = 2)
lines(daten$temp, daten$higher, lty = 2)

plot of chunk gp-in-action-linear-model

Note that the plot not only shows predictions to the left and the right of our data points, i.e. for the region below 12 and above 26 degrees, but also between points. If you look at the code above, you’ll see that I have squeezed some x-values between some of the observations for the algorithm to generate. Any point that is not directly observed can (or in fact is) treated like a parameter we can estimate. So you see that estimation uncertainty remains low between two points and that this is not just because we have drawn a straight line between the ends of the pointwise error bands – it really is fairly low. You can also notice that the uncertainty increases slightly to the left and the right of our observations, which seems reasonable.

Just to illustrate how it affects our inference, let me add a SE kernel to the linear kernel:

LIN_SE_kernel <- '
data {
int<lower=1> N1;
int<lower=1> N2;
vector[N1 + N2] x;
int<lower=0> y1[N1];
}
transformed data {
int<lower=1> N; // Total length of the feature vector
vector[N1 + N2] mu; // Prior mean of the process

N <- N1 + N2;
mu <- rep_vector(0, N);
}
parameters {
vector[N2] y2;
real<lower=0> beta1;
real<lower=0> beta2;
real<lower=0> beta3;
real<lower=0> beta4;
real<lower=0> beta5;
}
model {
matrix[N, N] Sigma;
vector[N] y;

for (n in 1:N1) y[n] <- y1[n];
for (n in 1:N2) y[N1 + n] <- y2[n];

for(i in 1:N)
for(j in i:N){
Sigma[i, j] <- beta1 * exp(-beta2 * pow(x[i] - x[j], 2)) + beta3 * (x[i] - beta4) * (x[j] - beta4) + if_else(i==j, beta5, 0.0);
}

for(i in 1:(N-1))
for(j in (i+1):N){
Sigma[j, i] <- Sigma[i, j];
}

beta1 ~ cauchy(0,5);
beta2 ~ cauchy(0,5);
beta3 ~ cauchy(0,5);
beta4 ~ cauchy(0,5);
beta5 ~ cauchy(0,5);

y ~ multi_normal(mu, Sigma);
}'

x1 <- icecream$temp
N1 <- nrow(icecream)
idx1 <- (1):(N1)
x2 <- seq(0, 40, length.out = 40)
N2 <- length(x2)
idx2 <- (N1+1):(N1+N2)
x <- union(x1, x2)

stan_data <- list(N1 = N1,
N2 = N2,
x = x,
y1 = icecream$unit
)

ice.fit <- stan(model_code = LIN_SE_kernel,
data=stan_data,
iter=0, chains=0)
ice.res <- stan(fit = ice.fit, data = stan_data, iter = 800, chains = 2, seed = 123)
paras = c("beta1", "beta2", "beta3", "beta4", "beta5")
tmp.sum tmp % tbl_df %>% arrange(temp)

plot(daten$temp, daten$sold, type = "l", main = "Number of ice creams sold",
col = "#00526D", xlab = "Temperature (Celsius)", ylab = "Units sold", sub = "Dashed lines show 68% and 95% posterior interval, solid line depicts posterior median",
ylim = c(0, 800))
points(icecream$temp, icecream$units)
lines(daten$temp, daten$lowest, lty = 2, col = "grey")
lines(daten$temp, daten$highest, lty = 2, col = "grey")
lines(daten$temp, daten$lower, lty = 2)
lines(daten$temp, daten$higher, lty = 2)

plot of chunk gp-in-action-BIN-model

I must say I am quite amazed by the plot. The extent to which the replicated data matches the actual observations is striking. Additionally, the way we modelled the data based on Markus’ initial suggestions delivers very plausible predictions overall. The model predicts for example that 95% of the sales we can make at zero degrees are between 21 and 87 units, with median being 42 units.

The overall shape of our posterior predictions close matches those that also come out of Markus’ binomial model, but you see that the model is a bit more flexible in capturing the wiggly structure of the data in the middle.

Conclusion

I am happy to have taken the the chance to learn about Gaussian processes. They are an interesting tool that I have added to my bag of tricks. Together with Stan these models are very flexible and fairly easy to implement and test on a data set. So you may expect more examples in the future.

Appendix

prepare.plot require(dplyr)
ys ym ysd

post.var

all.preds sd.data for (i in 1:nrow(icecream)) {
tmp sd.data[, i] }
all.bands

plot.data ], lower = all.bands[2, ], higher = all.bands[3, ], highest = all.bands[4,
])
daten % tbl_df %>% arrange(temp)
return(daten)
}

sessionInfo()
## R version 3.2.1 (2015-06-18)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 7 x64 (build 7601) Service Pack 1
##
## locale:
## [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252
## [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
## [5] LC_TIME=German_Germany.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] dplyr_0.4.2 rstan_2.7.0-1 inline_0.3.14 Rcpp_0.12.0
## [5] knitr_1.10.5 RWordPress_0.2-3
##
## loaded via a namespace (and not attached):
## [1] codetools_0.2-11 XML_3.98-1.3 assertthat_0.1 bitops_1.0-6
## [5] R6_2.1.0 DBI_0.3.1 stats4_3.2.1 formatR_1.2
## [9] magrittr_1.5 evaluate_0.7 stringi_0.5-5 lazyeval_0.1.10
## [13] tools_3.2.1 stringr_1.0.0 RCurl_1.95-4.7 parallel_3.2.1
## [17] XMLRPC_0.3-0

Germany Quarterly GDP growth 0.4%

Since I was involved in developing the econometric model underyling the DIW Economic Barometer, I continue to follow how well it captures current developments. Last Friday (August 14) was another opportunity to juxtapose the indicator and the data. The first official number for Q2 2015 GDP growth from destatis is 0.4%. The econometric model underlying the DIW Economic Barometer indicated roughly this number. While the indicator is a unit-less index not tied to a specific value for GDP growth, my former colleagues put a price tag on the Q2 value of the indicator based on their own expertise that was 0.5%. Not bad I’d say!

Programming simple economic experiments in shiny

Note, September 16, 2015: I have corrected some errors in the R code chunks that sneaked in when I transformed the post to WordPress format. Sorry for any inconvenience.

In this post, I want to present a flexible way to implement small surveys or economic experiments in shiny. If you have some background in experimental economics, you may have noticed that the most widely used software to implement economic experiments is zTree. To be honest I never touched zTree so I can say little about it, but my impression was that capabilities are actually fairly limited. On the other hand, what are the alternatives? Well, let us assume pen&paper is not a viable alternative. A more modern approach is to code the experiment as a small webpage, i.e. use languages such as HTML, JavaScript, Ruby or Python (or a mix of them). The resulting app is then deployed to a server and can be called in a standard webbrowser. Unfortunately that requires that you are proficient in at least one of these languages and that you have an idea about how to set up a server to get this running, including how to handle users, sessions, create a database on the server to save the responses etc. etc. In case you are familiar with these things, I guess that’s the way to go, because it’s tremendously more flexible. Moreover, once you have an experiment running online, it’s available everywhere. No messy zTree installation on PCs in the lab. All you need is that computers in the lab are connected to the internet (a very mild requirement I’d say).

When I wanted to run my first experiments, I was not familiar with any of these things and learning them was a bit steep for me (both in terms of time and effort). Fortunately, I had a good friend to help me out. So without being a web programmer, or having friends willing to sink their precious free time, what else can you do? Enter: Dean Attali, who blogged about how to mimick a Google Doc in shiny.

Dean's little doc

I think Dean wrote a great post (again someone who seems to be a great teacher) and made a tremendous effort to describe in detail how to build this little app from scratch. His solution also shows his shinyjs package in action, a package that I will make extensive use of here, too.

Taking Dean’s code from his GitHub repo makes it straightforward to customize and expand his doc to a small multi-page survey, where users are prompted to provide answers to several questions and their responses are saved in a ‘database’. He has a follow-up post, where he discusses different possibilities to store these responses not only locally, but also how to push them to a remote server, e.g. Amazon’s S3 or your Dropbox. Using remote storage, e.g. on S3, you can also grant your co-authors immediate access to the data (see here).

Economic Experiments in shiny?

On a second thought, many, if not most, experiments that economist run are just like these little surveys. Consider the following structure:

  1. Login page, where participants that were recruited to the lab type in their credentials.
  2. A single or a set of pages with instructions about the experiment that is to follow.
  3. Pages with the main experiment, e.g. a series of pages where the participants have to provide answers to incentivized questions.
  4. Logout page, where participants often receive feedback on their responses and their final payoff.

This structure suffices for many of the experiments that I am aware of. In what follows, I will demonstrate how with the help of Dean’s code one can implement this basic structure in shiny. That being said, you may well ask for more – more interaction, fancy graphs – and find that the above structure is too rudimentary for what you have in mind. Think of what comes below as a proof of concept, but not an exhausive example of what is possible. In fact, I believe much more fancy things are possible than what I will show below, e.g. to have users interact etc.

Prerequisites

I assume you work from RStudio. We will need the same prerequisites as listed by Dean:

  • Packages: shiny, shinyjs,digest and dplyr installed from CRAN.
  • In case you want to replicate Dean’s or my example you will need install package DT to render the table using the jQuery plug-in DataTables. Apparently, you won’t have to install DT from GitHub anymore, since it was moved to CRAN as well. If you do not plan to include a data table like here, you will not need DT.

A simple experiment

I will split up the shiny app into three files: helpers.R, server.R and ui.R. In case you are not familiar with the latter two and/or the basic structure of shiny apps, I recommend you read through this introduction first. The file helpers.R will be sourced in one of the two other files and load some workhorse functions and parameters – it makes things more tidy.

Handling login, instructions & final page

Without any reference to the actual experiment that we will implement, we can begin to code the login page, which may serve as a blueprint independent of what comes thereafter. Let’s begin with the ui.R, i.e. the user interface that participants in the lab will see. Following the basic structure, this will be a login page. Start with a skeletal UI and consider the following basic page:

shinyUI(fluidPage(
useShinyjs(),
div(
id = "login_page",
titlePanel("Welcome to the experiment!"),
br(),
sidebarLayout(

sidebarPanel(
h2("Login"),
p("Welcome to today's experiment. Please use the user name provided on the instructions to login into the experiment."),
hidden(
div(
id = "login_error",
span("Your user name is invalid. Please check for typos and try again.", style = "color:red")
)
)
),

mainPanel(
textInput("user", "User", "123"),
textInput("password", "Password", ""),
actionButton("login", "Login", class = "btn-primary")
)
)
)
)
)

Let’s walk through the code together. First, I load shinyjs by writing useShinyjs() to be able to call functions from that package. Next I use the HTML tag div() to create a division, just like in plain HTML. By giving the division an id (here: id=login_page), I can later reference it and thereby change its properties using functions from shinyjs. Shiny will keep a list of all named items (divs, buttons, widgets and so on) in the background and allow you to call them by their name to affect them. Inside the div(...), I have put a plain shiny page that consists of a titlePanel and a sidebarLayout that you should be familiar with. I have put some basic login instructions in the sidebarPanel part of the page and a little further down you see that in the mainPanel there are three widgets: a textInput for username and for a password and an actionButton which is there for the user to confirm her credentials.

Enter: shinyjs

I haven’t talked about the part of the sidebarPanel that is wrapped in the hidden() command. This is our first encounter of a basic JS function provided by shinyjs. If you consult the help file ?hidden, it says:

Create a Shiny tag that is invisible when the Shiny app starts. The tag can be made visible later with shinyjs::toggle or shinyjs::show.

Thus, everything inside hidden will be invisible to the participants/users when the screen/app is loaded. In the above example, I have put an error message inside hidden, which will be uncovered in case the login credentials are invalid. The help file for hidden already mentions that anything hidden can be uncovered using the command show. Likewise, once shown, items can (again) be hidden using hide.

How and under which conditions login_error is uncovered is handled in the server.R script below. Details aside for now, the principle is as follows: the app waits for the user to press the button login. Once pressed the server checks the credentials. This check returns a boolean (true or false) and in case it is FALSE the div with id login_error will be uncovered.

This is also the general principle how I will build a multi-page app. For every “page” in your app, include a div in the ui.R and hide all but the initial screen using hidden. Then define certain “events” in the server.R script, e.g. the user clicking on a button that says “Continue to next page”. Let the server observe whether this event took place and if the answer is yes, hide the current page, and uncover the next. For example, if login credentials are correct what happens? The app will hide the div login_page and uncover another div, called instructions say, that was not yet included above, but which prints the instructions to the screen. E.g

shinyUI(fluidPage(
useShinyjs(),
div(
id = "login_page",
titlePanel("Welcome to the experiment!"),
br(),
sidebarLayout(

sidebarPanel(
h2("Login"),
p("Welcome to today's experiment. Please use the user name provided on the instructions to login into the experiment."),
hidden(
div(
id = "login_error",
span("Your user name is invalid. Please check for typos and try again.", style = "color:red")
)
)
),

mainPanel(
textInput("user", "User", ""),
textInput("password", "Password", ""),
actionButton("login", "Login", class = "btn-primary")
)
)
),

hidden(
div( id = "instructions",
h3("Here we post instructions for subjects..."),
p("In this experiment you will have to guess the in wich direction
a coin that is tossed repeatedly is biased. You will observe whether
the coin landed heads or tails over several tosses.... Bla bla"),
actionButton("confirm", label = "Ok, I got it...")
)
),

hidden(
div(
id = "end",
titlePanel("Thank you!"),
br(),
p("End of experiment.")
)
)
)
)

These are not a very carefully formatted divs that I included, but they get the idea across. The instructions will be hidden at the outset and only uncovered once the user submitted valid username and password. At the same time, the login_page will be hidden. Otherwise the app will throw the error message. Same for the instructions. Once the user presses the button labelled confirm, the app hides instructions and uncovers the end div. This gives the user the impression to walk through multiple pages, where in fact it is just a single page with multiple overlays that are covered and uncovered. Bingo!

Once we understood the general “hidden-hide-show” principle, we can now go on and include further pages by adding further hidden divs in the ui.R. Before we do that, let us peek into the server.R to see how the basic three-page structure we have until now is implemented server-wise.

shinyServer(
function(input, output, session) {

# When the Login button is clicked, check whether user name is in list
observeEvent(input$login, {

# Disable the login, user cannot click it until it is enabled again
disable("login")

# Check whether user name is correct
# Very simple here: a hard-coded password
user_ok <- input$password==session_password

# If credentials are valid push user into experiment
if(user_ok){
hide("login_page")
show("instructions")

} else {
# If credentials are invalid throw error and prompt user to try again
reset("login_page")
show("login_error")
enable("login")
}

})

observeEvent(input$confirm, {
hide("instructions")
show("end")
})
}
)

The first handler that appears in the server script is observeEvent. Again, consulting the respective help file ?observeEvent we get to know what this is for

Respond to “event-like” reactive inputs, values, and expressions.

This function is used to constantly observe the value of a certain variable. For example, let us consider an actionButton, e.g. the login button:

Creates an action button or link whose value is initially zero, and increments by one each time it is pressed.

When the app is started, it initializes the action button and adds an item to the input list, i.e. for the login button it creates something like

input$login <- 0

This value is constantly monitored by the server-side function observeEvent. Once its value changes, e.g. from zero to one, the code that appears inside the curly brackets will be executed.

In the given case this is a series of shinyjs commands together with an if-else check for whether the credentials are valid. You might wonder where is the variable session_password that I compare the entered password to. I put that into the helpers.R file where I park all utilities, such as specific functions that I will have to call or other hard-coded variables.

For now helpers.R, which resides in the same directory as the other two files, is simply

# Password to login for this session
session_password <- "foo"

Of course, I have to source this file at the beginning of the server.R.

If the password is correct, user_ok evaluates to TRUE and the login page will be hidden, whereas the instructions will be shown. If user_ok evaluates to FALSE, the instructions remain hidden and the login page remains visible. However, all text inputs will be reset (they will be empty again), the error message will be shown and the login button enabled again.

This is how a basic login page can be handled. And the “hidden-hide-show” principle is one of the key learning outcome of this post.

The main part

For the main part we have to add quite a bit of functionality usually, because this is were the actual experiment starts and the app has to react in maybe multiple ways to the responses by the user. It would also be common to have to store responses over multiple rounds of the same experiment.

I will show some basic functionality here by implementing a very simple (not to say boring) little experiment. The user has to guess in which direction a coin that is flipped is biased. The user observes the tosses in chunks, first one then three, then five and so on. After each new chunk, the user has to indicate which side she thinks is the one being more likely. And this will be done over several rounds. Sounds very boring – and it is, but it’s a simple example that already features a lot of different things to do.

I will also add some functionality to the final page, by showing a table of the decisions submitted by the user together with a short information text about what is the payoff from the experiment.

Let us begin with the complete ui.R

library(shinyjs)

source('helpers.R')
shinyUI(fluidPage(
useShinyjs(),
div(
id = "login_page",
titlePanel("Welcome to the experiment!"),
br(),
sidebarLayout(

sidebarPanel(
h2("Login"),
p("Welcome to today's experiment. Please use the user name provided on the instructions to login into the experiment."),
hidden(
div(
id = "login_error",
span("Your user name is invalid. Please check for typos and try again.", style = "color:red")
)
)
),

mainPanel(
textInput("user", "User", ""),
textInput("password", "Password", ""),
actionButton("login", "Login", class = "btn-primary")
)
)
),

hidden(
div( id = "instructions",
h3("Here we post instructions for subjects..."),
p("In this experiment you will have to guess the in wich direction
a coin that is tossed repeatedly is biased. You will observe whether
the coin landed heads or tails over several tosses.... Bla bla"),
actionButton("confirm", label = "Ok, I got it... let's start")
)
),

hidden(
div(
id = "form",
titlePanel("Main experimental screen"),

sidebarLayout(

sidebarPanel(
p("Indicate whether you think the coin that was tossed is more likely to land heads or tails based on the throws shown to you on the right."),
radioButtons("guess",
label = h3("Your based on the tosses so far"),
choices = list("Heads" = "Heads", "Tails" = "Tails"),
selected = NULL),
actionButton("submit", "Submit", class = "btn-primary")
),

mainPanel(
h4(textOutput("round_info")),
dataTableOutput(outputId="table")
)
)
)
),

hidden(
div(
id = "end",
titlePanel("Thank you!"),

sidebarLayout(

sidebarPanel(
p("You have reached the end of the experiment. Thank you for your participation."),
h4("Your payoff details:"),
textOutput("round")
),

mainPanel(
h4("Overview over your choices"),
dataTableOutput(outputId="results")
)
)
)
)
)
)

This is roughly the same as before, but with a new div for the main experimental screen. I again opted for the simple sidebarLayout with the user inputs being a simple radio button (I’d love to initialize it with state NULL so that nothing is selected, but this is not possible – setting it to NULL is like setting it to be first option) and a submit button. The mainPanel consists of a table that is the list of of Heads-Tails for that round. It comes in form of a DataTable, hence the DT package.

Note: You are free to do much more fancy things than a table. R is great for plotting and you can leverage its capabilities here by providing graphs that react to user input and much more.

The final page has pretty much the same structure now.

Handling data storage and multiple rounds

Let’s think about how we want the user to experience this ui.R. First the login and instructions as before. Then she is pushed into the first round of the main part. There she observes the split screen with radio button on the left, table on the right. We now wait for her to guess the bias, i.e. select and press submit. Now we want to show further tosses of the coin for the first round, i.e. update the table, but also not forget to save the previous response. To save user responses I bite the code by Dean Attali, so you might want to look at his article for further explanation. All I should mention here is that you will have to create a folder inside the the folder you saved helpers.R, ui.R and server.R that is called responses. This will be directory where user responses will be saved. In case you want to store data not locally, but on a remote server you will have to change that – but Dean has a separate article on this. After the user has played through all rounds, we want to push her out of the main experiment part and route her to the final page that says good-bye and gives users the possibility to review their choices.

That’s quite a bit to handle by server.R – let’s jump right into my final solution:

library(shiny)
require(digest)
require(dplyr)

source('helpers.R')

shinyServer(
function(input, output, session) {

##########################################################
########### PART I: LOGIN ################################
##########################################################

# When the Login button is clicked, check whether user name is in list
observeEvent(input$login, {

# User-experience stuff
shinyjs::disable("login")

# Check whether user name is correct
# Fix me: test against a session-specific password here, not username
user_ok <- input$password==session_password

# If credentials are valid push user into experiment
if(user_ok){
shinyjs::hide("login_page")
shinyjs::show("instructions")

# Save username to write into data file
output$username <- renderText({input$user})
} else {
# If credentials are invalid throw error and prompt user to try again
shinyjs::reset("login_page")
shinyjs::show("login_error")
shinyjs::enable("login")
}

})

##########################################################
########### PART II: INSTRUCTIONS ########################
##########################################################

observeEvent(input$confirm, {
hide("instructions")
show("form")
})

##########################################################
########### PART III: MAIN EXPERIMENT ####################
##########################################################

## Initialize reactive values
# round is an iterator that counts how often 'submit' as been clicked.
values <- reactiveValues(round = 1)
# df will carry the responses submitted by the user
values$df <- NULL

##########################################################
########## PART IIIa: MAIN HANDLER #######################
##########################################################

## This is the main experiment handler
# Observe the submit button, if clicked... FIRE
observeEvent(input$submit, {

# Increment the round by one
isolate({
values$round <- values$round +1
})

# Call function formData() (see below) to record submitted response
newLine <- isolate(formData())

# Write newLine into data frame df
isolate({
values$df <- rbind(values$df, newLine)
})

# Has the user reached the end of the experiment?
# If so then...
if(values$round > n_guesses){

# Draw a round from all rounds with equal probability
# Note: the username must be numeric here, because it serves
# as a seed for the RNG. See comment below.
isolate(values$payroll <- payoffRound(as.numeric(input$user)))

# Based on the drawn round determine the payoff. People get a Euro for having guessed correctly.
output$round <- renderText({
paste0("The computer selected round ", values$payroll,
". Because you guessed ",ifelse(values$df[values$payroll, 3]==true_state[values$payroll], "correctly ", "incorrectly "),
"we will add ", ifelse(values$df[values$payroll, 3]==true_state[values$payroll], prize, 0),
" Euro to your show-up fee. Your total payoff will therefore equals ",
ifelse(values$df[values$payroll, 3]==true_state[values$payroll], prize, 0) + show_up, " Euro.")
})
isolate(values$df[, 5] <- ifelse(values$df[values$payroll, 3]==true_state[values$payroll], prize, 0) + show_up)

# The function saveData() writes the df to disk.
# This can be a remote server!
saveData(values$df)

# Say good-bye
hide(id = "form")
show(id = "end")
}
})

## Utilities & functions

# I take formData from Dean with minor changes.
# When it is called, it creates a vector of data.
# This will be a row added to values$df - one for each round.
#
# Gather all the form inputs (and add timestamp)
formData <- reactive({
data <- sapply(fieldsAll, function(x) input[[x]])
data <- c(round = values$round-1, data, timestamp = humanTime(), payoff = NA)
data <- t(data)
data
})

# The coin flips shown on the right.
# Note I have added a small delay with progress bar, just to give
# users a more natural look-and-feel, since throwing coins usually takes time.
# I have disabled all of the features of DT below, because they distract users
output$table <- renderDataTable({
if(values$round > 1 && values$round <= n_guesses){
withProgress(message = 'Flipping the coin.',
detail = 'Please wait...', value = 0, {
for (i in 1:15) {
incProgress(1/15)
Sys.sleep(0.02)
}
})
}
idx.row <- sum(!is.na(flips[, min(values$round, n_guesses)]))
idx.col <- min(values$round, n_guesses)
data.frame(Wurf = seq(1, idx.row), Seite= flips[1:idx.row, idx.col])
},
options = list(paging = FALSE,
searching = FALSE,
ordering = FALSE
)
)

# This renders the table of choices made by a participant that is shown
# to them on the final screen
output$results <- renderDataTable({
out <- data.frame(Round = rep(seq(1,n_rounds), each = guesses_per_round),
Guess = rep(seq(1, guesses_per_round), times = n_rounds),
choice = values$df[,3],
actual = rep(true_state, each = guesses_per_round)
)
colnames(out) <- c("Round", "Guess no.", "Your choice", "Correct/True value")
out
},
options = list(paging = FALSE,
searching = FALSE,
ordering = FALSE
)
)

# Tell user where she is
output$round_info <- renderText({
paste0("Round ", ceiling(values$round/guesses_per_round), " of ", n_rounds)
})

}
)

Phew, there are lots of new things. Let me go from top to bottom. First, I initialize an object called values and this is a reactive object. Again the help file has a concise description of what are its capabilities:

This function returns an object for storing reactive values. It is similar to a list, but with special capabilities for reactive programming. When you read a value from it, the calling reactive expression takes a reactive dependency on that value, and when you write to it, it notifies any reactive functions that depend on that value. (…)

The reactive variable values$round serves as a counter or iterator that I use to keep track of the round the user is currently in. Most experiments will have participants play multiple rounds of the same task and this variable is there to record in which round users currently are.

Next comes the main handler. The main handler observed the value of input$submit. If the user clicks it, the stuff in curly brackets is executed. Inside the curly brackets, we first increase values$round by one. Because this is a reactive value, it will broadcast its new value to all functions that depend on it. If you scroll down to the output$table you will note than e.g. the data table on the right depends on values$round and hence will get updated, once values$round changes.

Back to the main handler. Next the code creates an object called newLine which is row in our later data matrix. We write it to values$df. Next we check whether the previous round was the final round. This is the case if the iterator values$round exceeds the hard-coded number of rounds given by n_guesses. If the final round was reached, we do four things:

  1. Draw one round randomly and compare the actual bias of the coin to the guess by the user.
  2. Prepare a text message for the final screen which takes into account whether the user was lucky and will receive a bonus for having guessed correctly.

  3. Save all received responses to disk.

  4. Say good-bye by loading the final screen.

The astute reader will miss numerous functions that I call inside server.R, but that are nowhere defined. I follow Dean Attali’s style here and defer them to the helpers file:

# which fields get saved
fieldsAll <- c("user", "guess")

# self-explanatory
responsesDir <- file.path("responses")

# Password to login for this session
session_password <- "foo"

### Generate data here
###
###
###
set.seed(1906)
n_rounds <- 2
n_flips <- 3
probs <- c(0.6,0.4)
prize <- 1
show_up <- 10
probas <- array(, c(n_rounds, 2))
true_state <- sample(c("Heads", "Tails"), n_rounds, replace = TRUE)
for(i in 1:n_rounds){
if(true_state[i]=="Heads"){
probas[i,] <- probs
} else {
probas[i,] <- probs[2:1]
}
}

flips <- sapply(1:n_rounds, function(x) sample(c(1, -1), n_flips,
replace = TRUE,
prob = probas[x, ])
)
flips <- data.frame(flips)

cascade <- function(x, thin){
tmp <- rep(1, n_flips) %x% x
dim(tmp) <- c(n_flips, n_flips)
tmp[lower.tri(tmp)] <- NA
tmp[tmp==1] <- "Heads"
tmp[tmp!="Heads"] <- "Tails"
if(thin > 1){
tmp <- tmp[, seq(1, ncol(tmp), thin)]
}
return(tmp)
}

tmp <- lapply(flips, cascade, thin = 2)
flips <- do.call(cbind, tmp)

n_guesses <- ncol(flips)
guesses_per_round <- n_guesses/n_rounds

# add an asterisk to an input label
labelMandatory <- function(label) {
tagList(
label,
span("*", class = "mandatory_star")
)
}

# CSS to use in the app
appCSS <- ".mandatory_star { color: red; }
.shiny-input-container { margin-top: 25px; }
.shiny-progress .progress-text {
font-size: 18px;
top: 50% !important;
left: 50% !important;
margin-top: -100px !important;
margin-left: -250px !important;
}"

# Helper functions
humanTime <- function() format(Sys.time(), "%d-%m-%Y-%H-%M-%S")

saveData <- function(data) {
fileName <- sprintf("%s_%s.csv",
humanTime(),
digest::digest(data))

write.csv(x = data, file = file.path(responsesDir, fileName),
row.names = FALSE, quote = TRUE)
}

payoffRound <- function(user){
set.seed(user)
out <- sample(seq(1, n_guesses), 1)
return(out)
}

epochTime <- function() {
as.integer(Sys.time())
}

The helper file begins with defining the user inputs that will be saved along with a timestamp etc.

Another thing that I handle is generating the data. All coin flips are not drawn live during the experiment, but before the first screen is loaded. Because I set a static seed, all users will see the same sequence of flips.

Finally, several workhorse functions are defined. You may find a less thoroughly commented version of the code on GitHub.

Further notes & extensions

An idea that I like to borrow from Joern Hees is the possibility to use the username (or part of it) as a seed for the random numbers that will be drawn during the experiment. Depending on your analysis and the design of the experiment this can be very useful. For example, if you have two groups, one treatment and one control group, and you want to show exactly one subjects in the control and one in the treatment exactly the same stimuli, you can do this by conditioning all random numbers on their username. For example, take users A-01 and B-01. If we denote groups by A and B, then it’s easy to extract the two-digit number using a regular expression and take it as a seed for a random number generator.

The type of experiments that can be implemented in this way is still fairly limited. For example, I have not tried to program an experiment where different users interact, e.g. to run experiments that inspect how behavior changes within a group, or how certain types of communication affect outcomes.
But given there are examples of chat rooms being implemented using shiny, I see a chance this can be done.

Conclusion

Even though the length of the post and my messy code can look daunting and give the impression that this is not much easier than becoming a web programmer, it certainly is. Shiny handles a lot of server structure and initialization of databases, sessions and much more beneath the hood. The simple experiment I implemented above is more of a proof of concept and you may wish to expand it in many ways. I certainly have shown only a very narrow set of things that are possible within shiny.

Thus, in case you just got started running experiments and want to have more flexibility than zTree and not to hassle about the gory details, shiny might be an alternative for you.

A glimpse on Gaussian process regression

The initial motivation for me to begin reading about Gaussian process (GP) regression came from Markus Gesmann’s blog entry about generalized linear models in R. The class of models implemented or available with the glm function in R comprises several interesting members that are standard tools in machine learning and data science, e.g. the logistic regression model.

Looking at the scatter plots shown in Markus’ post reminded me of the amazing talk by Micheal Betancourt (there are actually two videos, but GPs only appear in the second – make sure you watch them both!). In one of the examples, he uses a Gaussian process with logistic link function to model data on the acceptance ratio of gay marriage as a function of age. The results he presented were quite remarkable and I thought that applying the methodology to Markus’ ice cream data set, was a great opportunity to learn what a Gaussian process regression is and how to implement it in Stan. I initially planned not to spend too much time with the theoretical background, but get to meat and potatoes quickly, i.e. try them in practice on a data set, see how they work, make some plots etc. But all introductory texts that I found were either (a) very mathy, or (b) superficial and ad hoc in their motivation. I wasn’t satisfied and had the feeling that GP remained a black box to me. Maybe you had the same impression and now landed on this site?

I was therefore very happy to find this outstanding introduction by David MacKay (DM). I think it is just perfect – a meticulously prepared lecture by someone who is passionate about teaching.

This provided me with just the right amount of intuition and theoretical backdrop to get to grip with GPs and explore their properties in R and Stan.

In this post I will follow DM’s game plan and reproduce some of his examples which provided me with a good intuition what is a Gaussian process regression and using the words of Davic MacKay “Throwing mathematical precision to the winds, a Gaussian process can be defined as a probability distribution on a space of unctions (…)”. In a future post, I will walk through an implementation in Stan, i.e. show how GP regression can be fitted to data and be used for prediction.

Gaussian processes

Let’ start with a standard definition of a Gaussian process

Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.

That’s a fairly general definition, and moreover it’s not all too clear what such a collection of rv’s has to do with regressions. To draw the connection, let me plot a bivariate Gaussian

library(mixtools)
library(mvtnorm)
K &amp;lt;- matrix(c(1, 0.9093729, 0.9093729, 1), ncol = 2)
set.seed(1906)
p &amp;lt;- rmvnorm(1, mean = rep(0, 2), sigma = K)
plot(p, type = "n", xlim = c(-5, 5), ylim = c(-5, 5), ylab = expression(y[2]), 
    xlab = expression(y[1]), main = "Fig. 1: Contours of bivariate Gaussian distribution")
abline(h = 0, v = 0)
points(p, cex = 2, pch = 20)
ellipse(mu = rep(0, 2), sigma = K, alpha = 0.05, npoints = 250, col = "darkred")
ellipse(mu = rep(0, 2), sigma = K, alpha = 0.2, npoints = 250, col = "red")

plot of chunk unnamed-chunk-2

p
##           [,1]      [,2]
## [1,] -1.116858 -1.196803

Where mean and covariance are given in the R code. The point p has coordinates y_1 = -1.12 and y_2=-1.19. One thing we can glean from the shape of the ellipse is that if y_1 is negative, y_2 is likely to be negative as well and vice versa. There is positive correlation between the two.

To draw the connection to regression, I plot the point p in a different coordinate system

plot(c(1, 2), p, xlim = c(0, 3), ylim = c(-2, 0), type = "p", pch = 20, cex = 2, 
    xlab = "x", ylab = "y", main = "Fig. 2: Points in x-y-plane")

plot of chunk unnamed-chunk-3

where y = [y_1=-1.12, y_2=-1.19] as before, but now the indexes 1 and 2 act as the explanatory/feature variable x=[1,2]. The y coordinates give us the height of the points for each x. And there is really nothing sacred about the numbers 1 and 2. I could equally well call the coordinates in the first plot [y_{1.763}, y_{\pi}] and virtually pick any number to index them. This illustrates, that the Gaussian process can be used to define a distribution over a function over the real numbers.

In our simple starting example, I can draw a line to connect the two dots, much as a regression line would do to illustrate this for two dimensions. But you maybe can imagine how I can go to higher dimensional distributions and fill up any of the gaps before, after or between the two points.

The upshot of this is: every point y=(y_i, y_j) from a set Y with indexes i and j from an index set X, can be taken to define two points in the Y \times X plane. And I deliberately wrote i and j instead of 1 and 2, because the indexes can be arbitrary real numbers. I can continue this simple example and sample more points (let me combine the graphs to save some space here)

par(mfcol = c(1, 2))
N &amp;lt;- 15
set.seed(1906)
p &amp;lt;- rmvnorm(N, mean = rep(0, 2), sigma = K)
plot(p, type = "n", xlim = c(-5, 5), ylim = c(-5, 5), ylab = expression(y[2]), 
    xlab = expression(y[1]), main = "Fig. 3a: Multiple draws form the Gaussian")
abline(h = 0, v = 0)
points(p, cex = 2, pch = 20)
ellipse(mu = rep(0, 2), sigma = K, alpha = 0.05, npoints = 250, col = "darkred")
ellipse(mu = rep(0, 2), sigma = K, alpha = 0.2, npoints = 250, col = "red")
plot(rep(1, N), p[, 1], xlim = c(0, 3), ylim = c(-3, 3), type = "p", pch = 20, 
    cex = 2, xlab = "x", ylab = "y", main = "Fig. 3b: Multiple draws in x-y-plane")
points(rep(2, N), p[, 2], pch = 20, cex = 2)
for (i in 1:N) {
    segments(x0 = 1, p[i, 1], x1 = 2, p[i, 2], col = i, lwd = 1.5)
}

plot of chunk unnamed-chunk-4

I have also drawn the line segments connecting the samples values from the bivariate Gaussian. This illustrates nicely how a zero-mean Gaussian distribution with a simple covariance matrix can define random linear lines in the right-hand side plot. A multivariate Gaussian is like a probability distribution over (finitely many) values of a function.

For now we only have two points on the right, but by going from the bivariate to the n-dimensional normal distribution I can get more points in. And keep in mind, I can also insert points in between – the x domain is really dense now, I need not take just some integer values. What I do have to do in order to add more points, is to specify the mean the covariance.

Another instructive view on this is when I introduce measurement errors or noise into the equation

y_i = f(x_i) + \epsilon_i where \epsilon_i \sim \text{Normal}(0, \sigma^2).

With this my model very much looks like a non-parametric or non-linear regression model with some function f.

Updating the prior

If the Gaussian distribution that we started with is nothing, but a prior belief about the shape of a function, then we can update this belief in light of the data.

There is a nice way to illustrate how learning from data actually works in this setting. Say, we get to learn the value of y_1=1.1. In terms of fig. 3b this means we have to fix the left-hand point at 1.1 and that any line segment connecting y_1 and y_2 has to originate from there. And maybe this gets the intuition across that this narrows down the range of values that y_2 is likely to take. It seems even more unlikely than before that, e.g., y_2

We can try to confirm this intuition using the fact that if

K = \begin{bmatrix} k_{11} & k_{12} \\ k_{21} & k_{22} \end{bmatrix}

is the covariance matrix of the Gaussian, we can deduce (see here)

p(y_2 \mid y_1, K) \propto \text{exp} \left( -\frac{1}{2} \begin{bmatrix} y_1 & y_2 \end{bmatrix} K^{-1} \begin{bmatrix} y_1 \\ y_2 \end{bmatrix} \right).

This can be simplified to a normal over y_2

p(y_2 \mid y_1, K) \propto \text{exp} (-\frac{1}{2} (y_2 - \hat{y}_2) k_{22} (y_2 - \hat{y}_2))

with mean \hat{y}_2 = 0 + (k_{11} k_{22}^{-1}) k_{12} y_1 and variance (1-k_{12}^2). Inserting the given numbers, you see that \hat{y}_2 = 1.0031 and that the conditional variance is around 0.17.

One notheworthy feature of the conditional distribution of y_{2} given y_{1} and x_1 is that it does not make any reference to the functional from of f(x). In that sense it is a non-parametric prediction method, because it does not depend on specifying the function f linking y to x.

Let’s update Figure 3:

par(mfcol = c(1, 2))
set.seed(1906)
y1.actual &amp;lt;- 1.1
y2.hat &amp;lt;- 0 + y1.actual * K[1, 2]
y2.var &amp;lt;- (1 - K[1, 2]^2)
p2 &amp;lt;- data.frame(y1 = rep(y1.actual, N), y2 = rnorm(N, mean = y2.hat, sd = sqrt(y2.var)))
plot(p2, type = "n", xlim = c(-5, 5), ylim = c(-5, 5), ylab = expression(y[2]), 
    xlab = expression(y[1]), main = "Fig. 3a: Multiple draws from the two-dimensional Gaussian")
abline(h = 0, v = 0)
abline(v = y1.actual, lty = 2, col = "grey", lwd = 1.5)
points(p, cex = 2, pch = 20, col = "lightgrey")
points(p2, cex = 2, pch = 20)
ellipse(mu = rep(0, 2), sigma = K, alpha = 0.05, npoints = 250, col = "darkred")
ellipse(mu = rep(0, 2), sigma = K, alpha = 0.2, npoints = 250, col = "red")
plot(rep(1, N), p2[, 1], xlim = c(0, 3), ylim = c(-3, 3), type = "n", pch = 20, 
    cex = 2, xlab = "x", ylab = "y", main = "Fig. 3b: Multiple draws in x-y-plane")
points(rep(1, N), p[, 1], pch = 20, cex = 2, col = "lightgrey")
points(rep(2, N), p[, 2], pch = 20, cex = 2, col = "lightgrey")
for (i in 1:N) {
    segments(x0 = 1, p[i, 1], x1 = 2, p[i, 2], col = "lightgrey", lwd = 1.5, 
        lty = 2)
}
for (i in 1:N) {
    segments(x0 = 1, p2[i, 1], x1 = 2, p2[i, 2], col = i, lwd = 1.5)
}
points(1, y1.actual, pch = 20, cex = 2)
points(rep(2, N), p2[, 2], pch = 20, cex = 2)

plot of chunk unnamed-chunk-5

The conditional distribution is considerably more pointed and the right-hand side plot shows how this helps to narrow down the likely values of y_2.

The upshot here is: there is a straightforward way to update the a priori GP to obtain simple expressions for the predictive distribution of points not in our training sample.

Higher dimensions

The connection to non-linear regression becomes more apparent, if we move from a bivariate Gaussian to a higher dimensional distrbution. With more than two dimensions, I cannot draw the underlying contours of the Gaussian anymore, but I can continue to plot the result in the Y \times X plane.

p &amp;lt;- rmvnorm(N, mean = rep(0, n), sigma = K)
plot(1, type = "n", xlim = c(0, 7), ylim = c(-3, 5), xlab = "x", ylab = "y")
abline(h = 0, v = 0)
for (i in 1:n) {
    points(rep(i, N), p[, i], pch = 20, cex = 2)
}
for (i in 1:N) {
    for (j in 1:(n - 1)) {
        segments(x0 = j, p[i, j], x1 = j + 1, p[i, j + 1], col = i, lwd = 1.5)
    }
}

plot of chunk unnamed-chunk-7

where again the mean of the Gaussian is zero and now the covariance matrix is

K
##            [,1]      [,2]      [,3]      [,4]      [,5]       [,6]
## [1,] 1.00000000 0.9093729 0.6838614 0.4252832 0.2187119 0.09301449
## [2,] 0.90937293 1.0000000 0.9093729 0.6838614 0.4252832 0.21871189
## [3,] 0.68386141 0.9093729 1.0000000 0.9093729 0.6838614 0.42528319
## [4,] 0.42528319 0.6838614 0.9093729 1.0000000 0.9093729 0.68386141
## [5,] 0.21871189 0.4252832 0.6838614 0.9093729 1.0000000 0.90937293
## [6,] 0.09301449 0.2187119 0.4252832 0.6838614 0.9093729 1.00000000

Having added more points confirms our intuition that a Gaussian process is like a probability distribution over functions. It also seems that if we would add more and more points, the lines would become smoother and smoother.
And in fact for the most common specification of Gaussian processes this will be the case, i.e. the GP prior will imply a smooth function.

Covariance function

Drawing more points into the plots was easy for me, because I had the mean and the covariance matrix given, but how exactly did I choose them? Randomly? I will give you the details below, but it should be clear that when we want to define a Gaussian process over an arbitrary (but finite) number of points, we need to provide some systematic way that gives us a covariance matrix and the vector of means. The former is usually denoted as k(x, x') for any two (feature) vectors x and x' in the domain of the function. The latter is usually denoted as m(x) and set to zero. With this one usually writes

y \sim GP ( m(x), k(x, x')).

With m set to zero, the entire shape or dynamics of the process are governed by the covariance matrix. If you look back at the last plot, you might notice that the covariance matrix I set to generate points from the six-dimensional Gaussian seems to imply a particular pattern. For paths of the process that start above the horizontal line (with a positive value), the subsequent values are lower. The other way around for paths that start below the horizontal line. Like in the two-dimensional example that we started with, the larger covariance matrix seems to imply negative autocorrelation.

Hence, we see one way we can model our prior belief. If we had a formula that returns covariance matrices that generate this pattern, we were able postulate a prior belief for an arbitrary (finite) dimension.

Squared exponential kernel

The formula I used to generate the $ij$th element of the covariance matrix of the process was

k_{i,j}(x, x') = \text{exp}(-0.095 (x_i - x_j)^2)

More generally, one may write this covariance function in terms of hyperparameters

k_{i,j}(x, x') = \theta_1 + \theta_2 \, \text{exp}\left(-\frac{1}{2 \ell^2} (x_i - x_j)^2\right)

Because k is a function of the squared Euclidean distance between x_i and x_j, it captures the idea of diminishing correlation between distant points. How fast the exponential term tends towards unity is goverened by the hyperparameter \ell which is called lenght scale. For large \ell the term inside the exponential will be very close to zero and thus the kernel will be constant over large parts of the domain. The hyperparameter \theta_2 scales the overall variances and covariances and \theta_1 allows for an offset.

The squared exponential kernel is apparently the most common function form for the covariance function in applied work, but it may still seem like a very ad hoc assumption about the covariance structure. It turns out, however, that the squared exponential kernel can be derived from a linear model of basis functions of x (see section 3.1 here). In general, one is free to specify any function that returns a positive definite matrix k(x, x') for all possible x and x'. Discussing the wide array of possible kernels is certainly beyond the scope of this post and I therefore happily refer any reader to the introductory text by David MacKay (see previous link) and the textbook by Rasmussen and Williams who have an entire chapter on covariance functions and their properties.

Up next

Now that I have a rough idea of what is a Gaussian process regression and how it can be used to do nonlinear regression, the question is how to make them operational. In terms of the Bayesian paradigm, we would like to learn what are likely values for \theta_1, \theta_2 and \ell in light of data. Then we can determine the mode of this posterior (MAP). Likewise, one may specify a likelhood function and use hill-climbing algorithms to find the ML estimates. It is not too hard to imagine that for real-world problems this can be delicate. Especially if we include more than only one feature vector, the likelihood is often not unimodal and all sort of restrictions on the parameters need to be imposed to guarantee the result is a covariance function that always returns positive definite matrices.

Fitting a GP to data will be the topic of the next post on Gaussian processes.

Inauguration post

I will start this blog by writing about how I started this blog. One of the first things I wanted to make sure was to be able to blog from R, or RStudio and not have to use the WordPress.com editor. So using RStudio this is actually pretty simple and I can take it easy with my first post and reblog what others have invented to make it happen.

I write my blog entries in RStudio using Rmarkdown and then push them to the WordPress servers. And the way I implemented this is much along the lines of what is described here.

Install R package and link to WordPress

Hence, first install the RWordPress package

if (!require(RWordPress)) {
    install.packages("RWordPress", repos = "http://www.omegahat.org/R")
}

and link it to your account (insert your username for ‘yourWPusername’ and your password for ‘yourWPpassword’)

options(WordPressLogin = c(yourWPusername = "yourWPpassword"), WordPressURL = "https://yourWPusername.wordpress.com/xmlrpc.php")

You can do a quick whether your credentials are working by calling

getUsersBlogs()

So far so good.

Handling graphs and figures

When you blog about R, I guess its reasonable to expect to produce many graphs to show on your blog. I handle graphs following Yihui Xie’s suggestion and save them in my dropbox. I created a separate folder in my Public folder called wp that is a bucket for all the figures that I create. Then I tell knitr that I want to save any figure created in this folder.

opts_knit$set(base.url = &apos;https://dl.dropboxusercontent.com/u/some_number/wp/&apos;,
              base.dir = &apos;C:/&lt;path to dropbox&gt;/Dropbox/Public/wp/&apos;)

The first line includes the URL of the wp folder which you can find by putting a little file in it and right-clicking on it to copy the link. Then insert the copied link and the path to your newly created folder into the snippet above.

knit2html -> WordPress.com format

Suppose your have your blog post saved in a file post.Rmd in the Rmarkdown .Rmd format. Using the function knit2htmlyou can turn it into an HTML file (in RStudio simply press the the ‘Knit HTML’ button at the top). To push the content of the HTML file to WordPress one has to extract the body of the HTML file, otherwise WordPress.com won’t accept it (at least when you are hosting your blog with WordPress.com – which I do).

On the blog I linked above, the author provides a function to sanitize the HTML file and return a string that you can pass on to WordPress using the RWordPress function newPost. But for me it didn’t work, because it inserted crazy special characters here and there. After googling a bit, I found an alternative solution (I forgot where, feel free to point me to the original and I will happily link to it):

knit2wp &lt;- function(file) {
    require(XML)
    content &lt;- readLines(file)
    content &lt;- htmlTreeParse(content, trim=FALSE)

    ## WP will add the h1 header later based on the title, so delete here
    content$children$html$children$body$children$h1 &lt;- NULL
    content &lt;- paste(capture.output(print(content$children$html$children$body,
                                          indent=FALSE, tagSeparator="")),
                     collapse="\n")
    content &lt;- gsub("&lt;?.body&gt;", "", content)         # remove body tag

    ## enclose code snippets in SyntaxHighlighter format
    content &lt;- gsub("&lt;?pre&gt;&lt;code class=\"r\"&gt;", "\\\\\n",
                    content)
    content &lt;- gsub("&lt;?pre&gt;&lt;code class=\"no-highlight\"&gt;", "\\\\\n",
                    content)
    content &lt;- gsub("&lt;?pre&gt;&lt;code&gt;", "\\\\\n", content)
    content &lt;- gsub("&lt;?/code&gt;&lt;/pre&gt;", "\\[/code\\]\\\n", content)
    return(content)
}

Workflow

The way it now works for me is as follows. Suppose I have my post written in Rmarkdown and saved in the file test.Rmd. Set the current working directory to the location of this file and then run the following script:

library(RWordPress)
options(WordPressLogin = c(yourWPusername = "yourWPpassword"), WordPressURL = "https://yourWPusername.wordpress.com/xmlrpc.php")
library(knitr)
opts_knit$set(base.url = &apos;https://dl.dropboxusercontent.com/u/some_number/wp/&apos;,
              base.dir = &apos;C:/&lt;path to dropbox&gt;/Dropbox/Public/wp/&apos;)

knit2wp &lt;- function(file) {
    require(XML)
    content &lt;- readLines(file)
    content &lt;- htmlTreeParse(content, trim=FALSE)

    ## WP will add the h1 header later based on the title, so delete here
    content$children$html$children$body$children$h1 &lt;- NULL
    content &lt;- paste(capture.output(print(content$children$html$children$body,
                                          indent=FALSE, tagSeparator="")),
                     collapse="\n")
    content &lt;- gsub("&lt;?.body&gt;", "", content)         # remove body tag

    ## enclose code snippets in SyntaxHighlighter format
    content &lt;- gsub("&lt;?pre&gt;&lt;code class=\"r\"&gt;", "\\\\\n",
                    content)
    content &lt;- gsub("&lt;?pre&gt;&lt;code class=\"no-highlight\"&gt;", "\\\\\n",
                    content)
    content &lt;- gsub("&lt;?pre&gt;&lt;code&gt;", "\\\\\n", content)
    content &lt;- gsub("&lt;?/code&gt;&lt;/pre&gt;", "\\[/code\\]\\\n", content)
    return(content)
}

knit2html("test.Rmd")

newPost(
    list(
        description=knit2wp(&apos;test.html&apos;),
        title=&apos;Test&apos;,
        categories=c(&apos;Programming&apos;, &apos;R&apos;),
        mt_keywords=c(&apos;rstats&apos;, &apos;blogging&apos;, &apos;XML-RPC&apos;, &apos;R&apos;, &apos;knitr&apos;, &apos;markdown&apos;)
    ),
    publish=FALSE)

Et voilá!