This is also slightly higher than in the previous months, where the indicator for Q3 was 103.8 points. Still, the indicator is slightly down from its Q2 value of around 107. The team at DIW thus sees a good chance that Q3 growth might pick up again a bit climbing from 0.4 to 0.5%.
Considering the sectoral decomposition of the different contributions to the overall indicator, it becomes clear that primarily the service and labor market data (shaded in dark grey and grey) have been driving the positive trend in the indicator. This has been the recurrent theme over the course of the whole year basically. Only in the second quarter the industrial sector (green bars) chipped in an additional 1.1 index points to drive the index to its peak at around 107, but otherwise service and robust employment data were the main drivers.
With the latest quarterly value again estimated above the neutral level of 100 points, the Barometer continues a quite remarkable streak of above-average values. In fact, if one uses the interactive chart at the DIW site to zoom into the values that followed the crisis in 2009, it becomes clear that economic conditions have been above average ever since the second half of 2013.
Only in 2012, and very briefly in 2011, conditions worsened to en extent that they were below-average. Approaching the second half of 2013, the indicator entered the current streak of continuously positiive (i.e. above average) values. It therefore appears that despite some headwinds, e.g. in form of the economic weakening in China, the German economy is in a very robust shape with only little indication of a downward trend in the near future.
As always, eee here for the full discussion with interactive charts and more data (in German) at the DIW Berlin website. The construction of the index is documented here, FAQs are answered here.
With a current value of 103.8 the indicator is slightly down from its Q2 value of around 107, but still above its neutral level of 100. In the previous quarter the team at DIW correctly predicted that q-o-q growth would amount to 0.4% in Q2 ’15 and despite a slightly weaker index, they remain confident that Q3 growth won’t be much different.
While the quarterly index is the headline index of the Barometer, it is always worth to take a look at the monthly series. The Barometer is based on monthly data and in fact the primitive index behind the Barometer is the monthly index that is aggregated using the Mariano and Murasawa (2003) pattern to form the quarterly index. I like to look at the monthly index, because it often can provide a more timely indication of the current trend (upward, downward or sideways), even though it is much more erractic.
If you consider the development over the past two years (beginning January 2013), the index began a gradualy hike towards a peak in January 2014 and subsequently followed a very mild downward trend. In 2015 the index has been less erratic than in the previous year, but it nonetheless appears to continue to gradually decrease. Note however that the neutral level of the index is at 33 and that it still remains above that level with around 35 points in August (the September prediction is roughly neutral with 33 point something).
See here for the full discussion with interactive charts and more data (in German) at the DIW Berlin website. The construction of the index is documented here, FAQs are answered here.
In the previous post about Gaussian processes it already became clear that the mean and covariance function and of the Gaussian process are at the heart of building the model
.
Nearly all texts that I found about GPs do not explicitly model the mean function , but set it to zero. But I think this need not necessarily be the best choice for every application at hand. If you have substantive prior knowledge that makes you confident to postulate some non-zero prior relationship between mean of and , one should arguably include it in the prior. However, the subsection on linear kernels below will illustrate how a GP prior with and linear kernel can be shown to represent a prior belief that the data comes from a linear model with a normal prior with mean zero for the parameters. Hence, the zero mean function induces shrinkage towards zero (to mitigate overfitting) like in a ridge regression.
I mentioned the squared exponential (SE) kernel already in the last post, which in case of a univariate feature vector is equivalent to set the -th element of the covariance matrix
with (hyper-)parameters such that the resulting covariance matrix is always positive-definite. In the last post on GPs, I mentioned that one implication of such a prior is that (for given length scale ) values for and are less correlated the further their apart in terms of the distance between and . How do functions fron this prior distribution look like? We can use Stan to simulate processes from the GP prior. The Stan manual provides examples how to implement this, and I largely copy their code here:
stan_code <- ' // Sample from Gaussian Process // Hyperparameters of the covariance functions are part of the data data { int<lower=1> N; real x[N]; real<lower=0> ell; real<lower=0> theta1; real<lower=0> theta2; } transformed data { matrix[N,N] L; vector[N] mu; cov_matrix[N] Sigma; mu <- rep_vector(0, N); for(i in 1:N) for(j in i:N){ Sigma[i, j] <- theta1 + theta2 * exp(-(1/(2 * pow(ell,2))) * pow(x[i] - x[j], 2)); } for(i in 1:(N-1)) for(j in (i+1):N){ Sigma[j, i] <- Sigma[i, j]; } L <- cholesky_decompose(Sigma); } parameters { vector[N] z; } model { z ~ normal(0,1); } generated quantities { vector[N] y; y <- mu + L * z; }' library(rstan) stan_data <- list(N=100, x=1:100, ell=1, theta1=1, theta2=1) model <- stan(model_code = stan_code, data=stan_data, iter=0, chains=0, verbose =FALSE)
which we can run from R using rstan
ells <- c(0.000095, 1.1, 3.1) offset <- c(-10, 0, 10) for (i in 1:length(ells)) { sims theta1 = 1, theta2 = 5), iter = 50, chains = 1, seed = c(123), verbose = FALSE) tmp if (i == 1) { plot(offset[i] + tmp$y[15, ], type = "l", col = i, lwd = 2, ylim = c(-15, 15), ylab = "y", xlab = "x", sub = "Lines offset by -10, 0 and 10 for illustration") legend("topleft", legend = c(paste("l = ", ells)), lwd = rep(2, 3), col = 1:3) } else { lines(offset[i] + tmp$y[15, ], type = "l", col = i, lwd = 2) } }
The plots confirm a simple intuition: the values of become more and more correlated across different values of , the higher the length scale , because the term inside the exponential is pulled towards zero. Another view on that explains its name is that it scales the -axis. Imagine two points and with Euclidean distance . The term inside the exponential is like . Hence, increasing is just like decreasing the distance , i.e. pulling points closer together, which we can be understood as a compression of the -axis. Another view on is through the lens of variable or feature selection. In a model with more than a single explanatory each of which has its own individual length scale , we can do feature selection through the individual length scales. To see this, note that for the influence of variable in the kernel vanishes and is essentially constant in that variable.
The linear kernel has the nice property that it can be derived from a linear model. Doing so is instructive, because it establishes an example of how kernels can be elicited based on features of the data. For this suppose we model the relationship between and by a set of basis functions , e.g. would be such a set of basis functions of that describe a polyniomial regression. Hence,
is still a linear model, in the sense that it is linear in the parameters . We may write this more compactly
where is the matrix of regressors and is a conformable vector if iid noise with mean zero and variance . If we impose a normal prior where the mean is a vector of zeros and denotes the -dimensional identity matrix ( the number of parameters), it turns our that is also normal with zero mean
and covariance
, i.e. .
This means that for any finite number of points the joint distribution of the elements in is Gaussian: this was the defining property of a Gaussian process stated in the previous post. We conclude that
and
.
The simplest special case is given by a single feature vector and . In that case
.
Note that if we want to include an intercept in the linear regression, we can do this either by inlcuding a columns of ones in or by setting
.
In the latter case, the parameter gives the -coordinate at which the prior function will be zero (plus noise if we include it – as we do here).
One can easily simulate such a model to get an idea how our prior looks
stan_code <- ' // Sample from Gaussian Process // Hyperparameters of the covariance functions are part of the data data { int<lower=1> N; real x[N]; real<lower=0> sigma_b; real<lower=0> sigma_e; real c; } transformed data { matrix[N,N] L; vector[N] mu; cov_matrix[N] Sigma; mu <- rep_vector(0, N); for(i in 1:N) for(j in i:N){ Sigma[i, j] <- sigma_b * (x[i] - c) * (x[j] - c) + if_else(i==j, sigma_e, 0.0); } for(i in 1:(N-1)) for(j in (i+1):N){ Sigma[j, i] <- Sigma[i, j]; } L <- cholesky_decompose(Sigma); } parameters { vector[N] z; } model { z ~ normal(0,1); } generated quantities { vector[N] y; y <- mu + L * z; }' library(rstan) stan_data <- list(N=100, x=1:100, sigma_e=1, sigma_b=1, c=16) model <- stan(model_code = stan_code, data=stan_data, iter=0, chains=0, verbose =FALSE)
sigmas <- c(0.02, 0.15, 0.5) for (i in 1:length(sigmas)) { sims <- stan(fit = model, data = list(N = 100, x = 1:100, sigma_b = sigmas[i], sigma_e = 1, c = 16), iter = 50, chains = 1, seed = c(123), verbose = FALSE) tmp <- extract(sims, pars = "y") if (i == 1) { plot(tmp$y[15, ], type = "l", col = i, lwd = 2, ylim = c(-50, 50), ylab = "y", xlab = "x") legend("topleft", legend = c(paste("sigma_b = ", sigmas)), lwd = rep(2, 3), col = 1:3) } else { lines(tmp$y[15, ], type = "l", col = i, lwd = 2) } }
The plots show that our prior functional form is indeed linear with gradual fluctuations around this linear trend. The amplitude of these fluctuations is governed by the error variance, the higher the more erratic is the prior. Likewise, the smaller the prior variance of the coefficients, the more likely we think the linear trend is actually just flat. This shows that even with mean function set to zero, we may still model trending data solely in terms of the covariance function. The sampled lines cross the -axis around the value , as described above. Since we add noise to the observations, the sampled paths are not exactly zero at this point. However, apart from the noise there is no uncertainty at this points, i.e. the prior variance is minimal at this point and gradually increases as we move away from it.
There are many, many more kernels that have been proposed in different places. I found this little website by David Duvenaud quite helpful (make sure you also check out the chapter of his thesis that is linked on his page) to get (a) an overview over several prominent kernels and (b) an idea how to combine them to yields ever more flexible structures.
In order to illustrate how to operationalize GP regressions, I decided to pick up on a little demonstration on Markus Gesmann's blog again. Hence, I wrap my discussion around the icecream dataset that Markus used to illustrate some general features of generalized linear models.
icecream <- data.frame(temp = c(11.9, 14.2, 15.2, 16.4, 17.2, 18.1, 18.5, 19.4, 22.1, 22.6, 23.4, 25.1), units = c(185L, 215L, 332L, 325L, 408L, 421L, 406L, 412L, 522L, 445L, 544L, 614L)) basicPlot <- function(...) { plot(units ~ temp, data = icecream, bty = "n", lwd = 2, main = "Number of ice creams sold", col = "#00526D", xlab = "Temperature (Celsius)", ylab = "Units sold", ...) axis(side = 1, col = "grey") axis(side = 2, col = "grey") } basicPlot()
Markus has a very nice demonstration of how to gradually build a model that fits the data and its properties. From the simple scatter plot we can see already that when we wish to start modelling the data by means of a GP regression, there is little hope we can employ a simple SE kernel. Instead of being characterized by gradual up- and down swings, there is a pretty obvious trend in the data and I thus begin with a linear kernel.
LIN_kernel <- ' data { int<lower=1> N1; int<lower=1> N2; vector[N1 + N2] x; int<lower=0> y1[N1]; } transformed data { int<lower=1> N; // Total length of the feature vector vector[N1 + N2] mu; // Prior mean of the process N <- N1 + N2; mu <- rep_vector(0, N); } parameters { vector[N2] y2; real<lower=0> beta1; real<lower=0> beta2; real<lower=0> beta3; } model { matrix[N, N] Sigma; vector[N] y; for (n in 1:N1) y[n] <- y1[n]; for (n in 1:N2) y[N1 + n] <- y2[n]; for(i in 1:N) for(j in i:N){ Sigma[i, j] <- beta1 * (x[i] - beta2) * (x[j] - beta2) + if_else(i==j, beta3, 0.0); } for(i in 1:(N-1)) for(j in (i+1):N){ Sigma[j, i] <- Sigma[i, j]; } beta1 ~ cauchy(0,5); beta2 ~ cauchy(0,5); beta3 ~ cauchy(0,5); y ~ multi_normal(mu, Sigma); }' x1 <- icecream$temp N1 <- nrow(icecream) idx1 <- (1):(N1) x2 <- seq(0, 40, length.out = 40) N2 <- length(x2) idx2 <- (N1+1):(N1+N2) x <- union(x1, x2) library(rstan) rstan_options(auto_write = TRUE) options(mc.cores = parallel::detectCores()-1) stan_data <- list(N1 = N1, N2 = N2, x = x, y1 = icecream$unit ) ice.fit <- stan(model_code = LIN_kernel, data=stan_data, iter=0, chains=0)
I should mention that I went with the default priors set in the some of the example models from the manual. However, the motivation for using the Cauchy prior is that in many settings the Cauchy with location zero and scale five is weakly informative, i.e. implied prior effect sizes broadly encompass any reasonable value with large probability – and some we consider unreasonable with still sufficient mass not to entirely preclude them. But I didn't really do any sanity check here, so to build a serious model we would of course have to go through some elaborate prior elicitation.
ice.res <- stan(fit = ice.fit, data = stan_data, iter = 800, chains = 2, seed = 123)
Note that I have only run 800 iterations on two chains. If you are familiar with other posterior simulation algorithms, e.g. Metropolis-Hastings, you might chuckle a bit, because these algorithms often require north of several thousand iterations to achieve convergence even on simple models. Plus, they are marred by large amounts of autocorrelation between samples from the posterior, which in turn makes a large number of iterations necessary. Using Stan 800 iterations is really enough for this simple model.
We can confirm this by looking at the summary statistics for the hyperparameters:
paras = c('beta1', 'beta2', 'beta3') tmp.sum <- summary(ice.res, pars = paras) tmp <- tmp.sum$summary[, "Rhat"] tmp
## beta1 beta2 beta3 ## 1.004882 1.042516 1.035404
The potential scale reduction factor Rhat
indicates convergence for all parameters. Let us look at the autocorrelation:
test colnames(test) par(mfcol = c(1, 3)) sapply(1:3, function(i) acf(test[, i], main = colnames(test)[i]))
Pretty good and for just a few hundred iterations!
Hence, we can go on and plot the fit of the model together with out-of-sample predictions and some credibility region (in case you are re-running this on your computer, you certainly do not want to follow me here and explore the models using static graphs, but use ShinyStan instead). One nice thing about doing Bayesian inference here is that uncertainty surrounding our estimates (i) are actual posterior intervals (which many people wrongly equate with a confidence interval), (ii) do not have a rigid theoretical form that hinges on asymptotic approximations and (iii) price in the estimation uncertainty surrounding the unknowns in our model, e.g. the hyperparameters in the kernel.
library(dplyr) ys ym ysd post.var all.preds sd.data for (i in 1:nrow(icecream)) { tmp sd.data[, i] } all.bands 0.84, 0.975))) plot.data lower = all.bands[2, ], higher = all.bands[3, ], highest = all.bands[4, ]) daten % tbl_df %>% arrange(temp) plot(daten$temp, daten$sold, type = "l", main = "Number of ice creams sold", col = "#00526D", xlab = "Temperature (Celsius)", ylab = "Units sold", sub = "Dashed lines show 68% and 95% posterior interval, solid line depicts posterior median", ylim = c(0, 900)) abline(h = 0, v = 0) points(icecream$temp, icecream$units) lines(daten$temp, daten$lowest, lty = 2, col = "grey") lines(daten$temp, daten$highest, lty = 2, col = "grey") lines(daten$temp, daten$lower, lty = 2) lines(daten$temp, daten$higher, lty = 2)
Note that the plot not only shows predictions to the left and the right of our data points, i.e. for the region below 12 and above 26 degrees, but also between points. If you look at the code above, you’ll see that I have squeezed some -values between some of the observations for the algorithm to generate. Any point that is not directly observed can (or in fact is) treated like a parameter we can estimate. So you see that estimation uncertainty remains low between two points and that this is not just because we have drawn a straight line between the ends of the pointwise error bands – it really is fairly low. You can also notice that the uncertainty increases slightly to the left and the right of our observations, which seems reasonable.
Just to illustrate how it affects our inference, let me add a SE kernel to the linear kernel:
LIN_SE_kernel <- ' data { int<lower=1> N1; int<lower=1> N2; vector[N1 + N2] x; int<lower=0> y1[N1]; } transformed data { int<lower=1> N; // Total length of the feature vector vector[N1 + N2] mu; // Prior mean of the process N <- N1 + N2; mu <- rep_vector(0, N); } parameters { vector[N2] y2; real<lower=0> beta1; real<lower=0> beta2; real<lower=0> beta3; real<lower=0> beta4; real<lower=0> beta5; } model { matrix[N, N] Sigma; vector[N] y; for (n in 1:N1) y[n] <- y1[n]; for (n in 1:N2) y[N1 + n] <- y2[n]; for(i in 1:N) for(j in i:N){ Sigma[i, j] <- beta1 * exp(-beta2 * pow(x[i] - x[j], 2)) + beta3 * (x[i] - beta4) * (x[j] - beta4) + if_else(i==j, beta5, 0.0); } for(i in 1:(N-1)) for(j in (i+1):N){ Sigma[j, i] <- Sigma[i, j]; } beta1 ~ cauchy(0,5); beta2 ~ cauchy(0,5); beta3 ~ cauchy(0,5); beta4 ~ cauchy(0,5); beta5 ~ cauchy(0,5); y ~ multi_normal(mu, Sigma); }' x1 <- icecream$temp N1 <- nrow(icecream) idx1 <- (1):(N1) x2 <- seq(0, 40, length.out = 40) N2 <- length(x2) idx2 <- (N1+1):(N1+N2) x <- union(x1, x2) stan_data <- list(N1 = N1, N2 = N2, x = x, y1 = icecream$unit ) ice.fit <- stan(model_code = LIN_SE_kernel, data=stan_data, iter=0, chains=0)
ice.res <- stan(fit = ice.fit, data = stan_data, iter = 800, chains = 2, seed = 123)
paras = c("beta1", "beta2", "beta3", "beta4", "beta5") tmp.sum tmp % tbl_df %>% arrange(temp) plot(daten$temp, daten$sold, type = "l", main = "Number of ice creams sold", col = "#00526D", xlab = "Temperature (Celsius)", ylab = "Units sold", sub = "Dashed lines show 68% and 95% posterior interval, solid line depicts posterior median", ylim = c(0, 800)) points(icecream$temp, icecream$units) lines(daten$temp, daten$lowest, lty = 2, col = "grey") lines(daten$temp, daten$highest, lty = 2, col = "grey") lines(daten$temp, daten$lower, lty = 2) lines(daten$temp, daten$higher, lty = 2)
I must say I am quite amazed by the plot. The extent to which the replicated data matches the actual observations is striking. Additionally, the way we modelled the data based on Markus’ initial suggestions delivers very plausible predictions overall. The model predicts for example that 95% of the sales we can make at zero degrees are between 21 and 87 units, with median being 42 units.
The overall shape of our posterior predictions close matches those that also come out of Markus’ binomial model, but you see that the model is a bit more flexible in capturing the wiggly structure of the data in the middle.
I am happy to have taken the the chance to learn about Gaussian processes. They are an interesting tool that I have added to my bag of tricks. Together with Stan these models are very flexible and fairly easy to implement and test on a data set. So you may expect more examples in the future.
prepare.plot require(dplyr) ys ym ysd post.var all.preds sd.data for (i in 1:nrow(icecream)) { tmp sd.data[, i] } all.bands plot.data ], lower = all.bands[2, ], higher = all.bands[3, ], highest = all.bands[4, ]) daten % tbl_df %>% arrange(temp) return(daten) }
sessionInfo()
## R version 3.2.1 (2015-06-18) ## Platform: x86_64-w64-mingw32/x64 (64-bit) ## Running under: Windows 7 x64 (build 7601) Service Pack 1 ## ## locale: ## [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 ## [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C ## [5] LC_TIME=German_Germany.1252 ## ## attached base packages: ## [1] stats graphics grDevices utils datasets methods base ## ## other attached packages: ## [1] dplyr_0.4.2 rstan_2.7.0-1 inline_0.3.14 Rcpp_0.12.0 ## [5] knitr_1.10.5 RWordPress_0.2-3 ## ## loaded via a namespace (and not attached): ## [1] codetools_0.2-11 XML_3.98-1.3 assertthat_0.1 bitops_1.0-6 ## [5] R6_2.1.0 DBI_0.3.1 stats4_3.2.1 formatR_1.2 ## [9] magrittr_1.5 evaluate_0.7 stringi_0.5-5 lazyeval_0.1.10 ## [13] tools_3.2.1 stringr_1.0.0 RCurl_1.95-4.7 parallel_3.2.1 ## [17] XMLRPC_0.3-0
When I wanted to run my first experiments, I was not familiar with any of these things and learning them was a bit steep for me (both in terms of time and effort). Fortunately, I had a good friend to help me out. So without being a web programmer, or having friends willing to sink their precious free time, what else can you do? Enter: Dean Attali, who blogged about how to mimick a Google Doc in shiny.
I think Dean wrote a great post (again someone who seems to be a great teacher) and made a tremendous effort to describe in detail how to build this little app from scratch. His solution also shows his shinyjs package in action, a package that I will make extensive use of here, too.
Taking Dean’s code from his GitHub repo makes it straightforward to customize and expand his doc to a small multi-page survey, where users are prompted to provide answers to several questions and their responses are saved in a ‘database’. He has a follow-up post, where he discusses different possibilities to store these responses not only locally, but also how to push them to a remote server, e.g. Amazon’s S3 or your Dropbox. Using remote storage, e.g. on S3, you can also grant your co-authors immediate access to the data (see here).
On a second thought, many, if not most, experiments that economist run are just like these little surveys. Consider the following structure:
This structure suffices for many of the experiments that I am aware of. In what follows, I will demonstrate how with the help of Dean’s code one can implement this basic structure in shiny. That being said, you may well ask for more – more interaction, fancy graphs – and find that the above structure is too rudimentary for what you have in mind. Think of what comes below as a proof of concept, but not an exhausive example of what is possible. In fact, I believe much more fancy things are possible than what I will show below, e.g. to have users interact etc.
I assume you work from RStudio. We will need the same prerequisites as listed by Dean:
shiny
, shinyjs
,digest
and dplyr
installed from CRAN.DT
to render the table using the jQuery plug-in DataTables
. Apparently, you won’t have to install DT
from GitHub anymore, since it was moved to CRAN as well. If you do not plan to include a data table like here, you will not need DT
.I will split up the shiny app into three files: helpers.R
, server.R
and ui.R
. In case you are not familiar with the latter two and/or the basic structure of shiny apps, I recommend you read through this introduction first. The file helpers.R
will be sourced in one of the two other files and load some workhorse functions and parameters – it makes things more tidy.
Without any reference to the actual experiment that we will implement, we can begin to code the login page, which may serve as a blueprint independent of what comes thereafter. Let’s begin with the ui.R
, i.e. the user interface that participants in the lab will see. Following the basic structure, this will be a login page. Start with a skeletal UI and consider the following basic page:
shinyUI(fluidPage( useShinyjs(), div( id = "login_page", titlePanel("Welcome to the experiment!"), br(), sidebarLayout( sidebarPanel( h2("Login"), p("Welcome to today's experiment. Please use the user name provided on the instructions to login into the experiment."), hidden( div( id = "login_error", span("Your user name is invalid. Please check for typos and try again.", style = "color:red") ) ) ), mainPanel( textInput("user", "User", "123"), textInput("password", "Password", ""), actionButton("login", "Login", class = "btn-primary") ) ) ) ) )
Let’s walk through the code together. First, I load shinyjs
by writing useShinyjs()
to be able to call functions from that package. Next I use the HTML tag div()
to create a division, just like in plain HTML. By giving the division an id (here: id=login_page
), I can later reference it and thereby change its properties using functions from shinyjs
. Shiny will keep a list of all named items (divs, buttons, widgets and so on) in the background and allow you to call them by their name to affect them. Inside the div(...)
, I have put a plain shiny page that consists of a titlePanel
and a sidebarLayout
that you should be familiar with. I have put some basic login instructions in the sidebarPanel
part of the page and a little further down you see that in the mainPanel
there are three widgets: a textInput
for username and for a password and an actionButton
which is there for the user to confirm her credentials.
I haven’t talked about the part of the sidebarPanel
that is wrapped in the hidden()
command. This is our first encounter of a basic JS function provided by shinyjs
. If you consult the help file ?hidden
, it says:
“Create a Shiny tag that is invisible when the Shiny app starts. The tag can be made visible later with shinyjs::toggle or shinyjs::show.”
Thus, everything inside hidden
will be invisible to the participants/users when the screen/app is loaded. In the above example, I have put an error message inside hidden
, which will be uncovered in case the login credentials are invalid. The help file for hidden
already mentions that anything hidden can be uncovered using the command show
. Likewise, once shown, items can (again) be hidden using hide
.
How and under which conditions login_error
is uncovered is handled in the server.R script below. Details aside for now, the principle is as follows: the app waits for the user to press the button login
. Once pressed the server checks the credentials. This check returns a boolean (true or false) and in case it is FALSE
the div
with id login_error
will be uncovered.
This is also the general principle how I will build a multi-page app. For every “page” in your app, include a div
in the ui.R
and hide all but the initial screen using hidden
. Then define certain “events” in the server.R
script, e.g. the user clicking on a button that says “Continue to next page”. Let the server observe whether this event took place and if the answer is yes, hide the current page, and uncover the next. For example, if login credentials are correct what happens? The app will hide the div login_page
and uncover another div, called instructions
say, that was not yet included above, but which prints the instructions to the screen. E.g
shinyUI(fluidPage( useShinyjs(), div( id = "login_page", titlePanel("Welcome to the experiment!"), br(), sidebarLayout( sidebarPanel( h2("Login"), p("Welcome to today's experiment. Please use the user name provided on the instructions to login into the experiment."), hidden( div( id = "login_error", span("Your user name is invalid. Please check for typos and try again.", style = "color:red") ) ) ), mainPanel( textInput("user", "User", ""), textInput("password", "Password", ""), actionButton("login", "Login", class = "btn-primary") ) ) ), hidden( div( id = "instructions", h3("Here we post instructions for subjects..."), p("In this experiment you will have to guess the in wich direction a coin that is tossed repeatedly is biased. You will observe whether the coin landed heads or tails over several tosses.... Bla bla"), actionButton("confirm", label = "Ok, I got it...") ) ), hidden( div( id = "end", titlePanel("Thank you!"), br(), p("End of experiment.") ) ) ) )
These are not a very carefully formatted divs that I included, but they get the idea across. The instructions will be hidden at the outset and only uncovered once the user submitted valid username and password. At the same time, the login_page
will be hidden. Otherwise the app will throw the error message. Same for the instructions
. Once the user presses the button labelled confirm
, the app hides instructions
and uncovers the end
div. This gives the user the impression to walk through multiple pages, where in fact it is just a single page with multiple overlays that are covered and uncovered. Bingo!
Once we understood the general “hidden-hide-show” principle, we can now go on and include further pages by adding further hidden divs in the ui.R
. Before we do that, let us peek into the server.R to see how the basic three-page structure we have until now is implemented server-wise.
shinyServer( function(input, output, session) { # When the Login button is clicked, check whether user name is in list observeEvent(input$login, { # Disable the login, user cannot click it until it is enabled again disable("login") # Check whether user name is correct # Very simple here: a hard-coded password user_ok <- input$password==session_password # If credentials are valid push user into experiment if(user_ok){ hide("login_page") show("instructions") } else { # If credentials are invalid throw error and prompt user to try again reset("login_page") show("login_error") enable("login") } }) observeEvent(input$confirm, { hide("instructions") show("end") }) } )
The first handler that appears in the server script is observeEvent
. Again, consulting the respective help file ?observeEvent
we get to know what this is for
“Respond to “event-like” reactive inputs, values, and expressions.”
This function is used to constantly observe the value of a certain variable. For example, let us consider an actionButton
, e.g. the login
button:
“Creates an action button or link whose value is initially zero, and increments by one each time it is pressed.”
When the app is started, it initializes the action button and adds an item to the input
list, i.e. for the login button it creates something like
input$login <- 0
This value is constantly monitored by the server-side function observeEvent
. Once its value changes, e.g. from zero to one, the code that appears inside the curly brackets will be executed.
In the given case this is a series of shinyjs
commands together with an if-else check for whether the credentials are valid. You might wonder where is the variable session_password
that I compare the entered password to. I put that into the helpers.R
file where I park all utilities, such as specific functions that I will have to call or other hard-coded variables.
For now helpers.R
, which resides in the same directory as the other two files, is simply
# Password to login for this session session_password <- "foo"
Of course, I have to source this file at the beginning of the server.R
.
If the password is correct, user_ok
evaluates to TRUE
and the login page will be hidden, whereas the instructions will be shown. If user_ok
evaluates to FALSE
, the instructions remain hidden and the login page remains visible. However, all text inputs will be reset (they will be empty again), the error message will be shown and the login button enabled again.
This is how a basic login page can be handled. And the “hidden-hide-show” principle is one of the key learning outcome of this post.
For the main part we have to add quite a bit of functionality usually, because this is were the actual experiment starts and the app has to react in maybe multiple ways to the responses by the user. It would also be common to have to store responses over multiple rounds of the same experiment.
I will show some basic functionality here by implementing a very simple (not to say boring) little experiment. The user has to guess in which direction a coin that is flipped is biased. The user observes the tosses in chunks, first one then three, then five and so on. After each new chunk, the user has to indicate which side she thinks is the one being more likely. And this will be done over several rounds. Sounds very boring – and it is, but it’s a simple example that already features a lot of different things to do.
I will also add some functionality to the final page, by showing a table of the decisions submitted by the user together with a short information text about what is the payoff from the experiment.
Let us begin with the complete ui.R
library(shinyjs) source('helpers.R') shinyUI(fluidPage( useShinyjs(), div( id = "login_page", titlePanel("Welcome to the experiment!"), br(), sidebarLayout( sidebarPanel( h2("Login"), p("Welcome to today's experiment. Please use the user name provided on the instructions to login into the experiment."), hidden( div( id = "login_error", span("Your user name is invalid. Please check for typos and try again.", style = "color:red") ) ) ), mainPanel( textInput("user", "User", ""), textInput("password", "Password", ""), actionButton("login", "Login", class = "btn-primary") ) ) ), hidden( div( id = "instructions", h3("Here we post instructions for subjects..."), p("In this experiment you will have to guess the in wich direction a coin that is tossed repeatedly is biased. You will observe whether the coin landed heads or tails over several tosses.... Bla bla"), actionButton("confirm", label = "Ok, I got it... let's start") ) ), hidden( div( id = "form", titlePanel("Main experimental screen"), sidebarLayout( sidebarPanel( p("Indicate whether you think the coin that was tossed is more likely to land heads or tails based on the throws shown to you on the right."), radioButtons("guess", label = h3("Your based on the tosses so far"), choices = list("Heads" = "Heads", "Tails" = "Tails"), selected = NULL), actionButton("submit", "Submit", class = "btn-primary") ), mainPanel( h4(textOutput("round_info")), dataTableOutput(outputId="table") ) ) ) ), hidden( div( id = "end", titlePanel("Thank you!"), sidebarLayout( sidebarPanel( p("You have reached the end of the experiment. Thank you for your participation."), h4("Your payoff details:"), textOutput("round") ), mainPanel( h4("Overview over your choices"), dataTableOutput(outputId="results") ) ) ) ) ) )
This is roughly the same as before, but with a new div for the main experimental screen. I again opted for the simple sidebarLayout
with the user inputs being a simple radio button (I’d love to initialize it with state NULL
so that nothing is selected, but this is not possible – setting it to NULL
is like setting it to be first option) and a submit button. The mainPanel
consists of a table that is the list of of Heads-Tails for that round. It comes in form of a DataTable
, hence the DT
package.
Note: You are free to do much more fancy things than a table. R is great for plotting and you can leverage its capabilities here by providing graphs that react to user input and much more.
The final page has pretty much the same structure now.
Let’s think about how we want the user to experience this ui.R
. First the login and instructions as before. Then she is pushed into the first round of the main part. There she observes the split screen with radio button on the left, table on the right. We now wait for her to guess the bias, i.e. select and press submit
. Now we want to show further tosses of the coin for the first round, i.e. update the table, but also not forget to save the previous response. To save user responses I bite the code by Dean Attali, so you might want to look at his article for further explanation. All I should mention here is that you will have to create a folder inside the the folder you saved helpers.R
, ui.R
and server.R
that is called responses
. This will be directory where user responses will be saved. In case you want to store data not locally, but on a remote server you will have to change that – but Dean has a separate article on this. After the user has played through all rounds, we want to push her out of the main experiment part and route her to the final page that says good-bye and gives users the possibility to review their choices.
That’s quite a bit to handle by server.R
– let’s jump right into my final solution:
library(shiny) require(digest) require(dplyr) source('helpers.R') shinyServer( function(input, output, session) { ########################################################## ########### PART I: LOGIN ################################ ########################################################## # When the Login button is clicked, check whether user name is in list observeEvent(input$login, { # User-experience stuff shinyjs::disable("login") # Check whether user name is correct # Fix me: test against a session-specific password here, not username user_ok <- input$password==session_password # If credentials are valid push user into experiment if(user_ok){ shinyjs::hide("login_page") shinyjs::show("instructions") # Save username to write into data file output$username <- renderText({input$user}) } else { # If credentials are invalid throw error and prompt user to try again shinyjs::reset("login_page") shinyjs::show("login_error") shinyjs::enable("login") } }) ########################################################## ########### PART II: INSTRUCTIONS ######################## ########################################################## observeEvent(input$confirm, { hide("instructions") show("form") }) ########################################################## ########### PART III: MAIN EXPERIMENT #################### ########################################################## ## Initialize reactive values # round is an iterator that counts how often 'submit' as been clicked. values <- reactiveValues(round = 1) # df will carry the responses submitted by the user values$df <- NULL ########################################################## ########## PART IIIa: MAIN HANDLER ####################### ########################################################## ## This is the main experiment handler # Observe the submit button, if clicked... FIRE observeEvent(input$submit, { # Increment the round by one isolate({ values$round <- values$round +1 }) # Call function formData() (see below) to record submitted response newLine <- isolate(formData()) # Write newLine into data frame df isolate({ values$df <- rbind(values$df, newLine) }) # Has the user reached the end of the experiment? # If so then... if(values$round > n_guesses){ # Draw a round from all rounds with equal probability # Note: the username must be numeric here, because it serves # as a seed for the RNG. See comment below. isolate(values$payroll <- payoffRound(as.numeric(input$user))) # Based on the drawn round determine the payoff. People get a Euro for having guessed correctly. output$round <- renderText({ paste0("The computer selected round ", values$payroll, ". Because you guessed ",ifelse(values$df[values$payroll, 3]==true_state[values$payroll], "correctly ", "incorrectly "), "we will add ", ifelse(values$df[values$payroll, 3]==true_state[values$payroll], prize, 0), " Euro to your show-up fee. Your total payoff will therefore equals ", ifelse(values$df[values$payroll, 3]==true_state[values$payroll], prize, 0) + show_up, " Euro.") }) isolate(values$df[, 5] <- ifelse(values$df[values$payroll, 3]==true_state[values$payroll], prize, 0) + show_up) # The function saveData() writes the df to disk. # This can be a remote server! saveData(values$df) # Say good-bye hide(id = "form") show(id = "end") } }) ## Utilities & functions # I take formData from Dean with minor changes. # When it is called, it creates a vector of data. # This will be a row added to values$df - one for each round. # # Gather all the form inputs (and add timestamp) formData <- reactive({ data <- sapply(fieldsAll, function(x) input[[x]]) data <- c(round = values$round-1, data, timestamp = humanTime(), payoff = NA) data <- t(data) data }) # The coin flips shown on the right. # Note I have added a small delay with progress bar, just to give # users a more natural look-and-feel, since throwing coins usually takes time. # I have disabled all of the features of DT below, because they distract users output$table <- renderDataTable({ if(values$round > 1 && values$round <= n_guesses){ withProgress(message = 'Flipping the coin.', detail = 'Please wait...', value = 0, { for (i in 1:15) { incProgress(1/15) Sys.sleep(0.02) } }) } idx.row <- sum(!is.na(flips[, min(values$round, n_guesses)])) idx.col <- min(values$round, n_guesses) data.frame(Wurf = seq(1, idx.row), Seite= flips[1:idx.row, idx.col]) }, options = list(paging = FALSE, searching = FALSE, ordering = FALSE ) ) # This renders the table of choices made by a participant that is shown # to them on the final screen output$results <- renderDataTable({ out <- data.frame(Round = rep(seq(1,n_rounds), each = guesses_per_round), Guess = rep(seq(1, guesses_per_round), times = n_rounds), choice = values$df[,3], actual = rep(true_state, each = guesses_per_round) ) colnames(out) <- c("Round", "Guess no.", "Your choice", "Correct/True value") out }, options = list(paging = FALSE, searching = FALSE, ordering = FALSE ) ) # Tell user where she is output$round_info <- renderText({ paste0("Round ", ceiling(values$round/guesses_per_round), " of ", n_rounds) }) } )
Phew, there are lots of new things. Let me go from top to bottom. First, I initialize an object called values
and this is a reactive object. Again the help file has a concise description of what are its capabilities:
“This function returns an object for storing reactive values. It is similar to a list, but with special capabilities for reactive programming. When you read a value from it, the calling reactive expression takes a reactive dependency on that value, and when you write to it, it notifies any reactive functions that depend on that value. (…)”
The reactive variable values$round
serves as a counter or iterator that I use to keep track of the round the user is currently in. Most experiments will have participants play multiple rounds of the same task and this variable is there to record in which round users currently are.
Next comes the main handler. The main handler observed the value of input$submit
. If the user clicks it, the stuff in curly brackets is executed. Inside the curly brackets, we first increase values$round
by one. Because this is a reactive value, it will broadcast its new value to all functions that depend on it. If you scroll down to the output$table
you will note than e.g. the data table on the right depends on values$round
and hence will get updated, once values$round
changes.
Back to the main handler. Next the code creates an object called newLine
which is row in our later data matrix. We write it to values$df
. Next we check whether the previous round was the final round. This is the case if the iterator values$round
exceeds the hard-coded number of rounds given by n_guesses
. If the final round was reached, we do four things:
Prepare a text message for the final screen which takes into account whether the user was lucky and will receive a bonus for having guessed correctly.
Save all received responses to disk.
Say good-bye by loading the final screen.
The astute reader will miss numerous functions that I call inside server.R
, but that are nowhere defined. I follow Dean Attali’s style here and defer them to the helpers file:
# which fields get saved fieldsAll <- c("user", "guess") # self-explanatory responsesDir <- file.path("responses") # Password to login for this session session_password <- "foo" ### Generate data here ### ### ### set.seed(1906) n_rounds <- 2 n_flips <- 3 probs <- c(0.6,0.4) prize <- 1 show_up <- 10 probas <- array(, c(n_rounds, 2)) true_state <- sample(c("Heads", "Tails"), n_rounds, replace = TRUE) for(i in 1:n_rounds){ if(true_state[i]=="Heads"){ probas[i,] <- probs } else { probas[i,] <- probs[2:1] } } flips <- sapply(1:n_rounds, function(x) sample(c(1, -1), n_flips, replace = TRUE, prob = probas[x, ]) ) flips <- data.frame(flips) cascade <- function(x, thin){ tmp <- rep(1, n_flips) %x% x dim(tmp) <- c(n_flips, n_flips) tmp[lower.tri(tmp)] <- NA tmp[tmp==1] <- "Heads" tmp[tmp!="Heads"] <- "Tails" if(thin > 1){ tmp <- tmp[, seq(1, ncol(tmp), thin)] } return(tmp) } tmp <- lapply(flips, cascade, thin = 2) flips <- do.call(cbind, tmp) n_guesses <- ncol(flips) guesses_per_round <- n_guesses/n_rounds # add an asterisk to an input label labelMandatory <- function(label) { tagList( label, span("*", class = "mandatory_star") ) } # CSS to use in the app appCSS <- ".mandatory_star { color: red; } .shiny-input-container { margin-top: 25px; } .shiny-progress .progress-text { font-size: 18px; top: 50% !important; left: 50% !important; margin-top: -100px !important; margin-left: -250px !important; }" # Helper functions humanTime <- function() format(Sys.time(), "%d-%m-%Y-%H-%M-%S") saveData <- function(data) { fileName <- sprintf("%s_%s.csv", humanTime(), digest::digest(data)) write.csv(x = data, file = file.path(responsesDir, fileName), row.names = FALSE, quote = TRUE) } payoffRound <- function(user){ set.seed(user) out <- sample(seq(1, n_guesses), 1) return(out) } epochTime <- function() { as.integer(Sys.time()) }
The helper file begins with defining the user inputs that will be saved along with a timestamp etc.
Another thing that I handle is generating the data. All coin flips are not drawn live during the experiment, but before the first screen is loaded. Because I set a static seed, all users will see the same sequence of flips.
Finally, several workhorse functions are defined. You may find a less thoroughly commented version of the code on GitHub.
An idea that I like to borrow from Joern Hees is the possibility to use the username (or part of it) as a seed for the random numbers that will be drawn during the experiment. Depending on your analysis and the design of the experiment this can be very useful. For example, if you have two groups, one treatment and one control group, and you want to show exactly one subjects in the control and one in the treatment exactly the same stimuli, you can do this by conditioning all random numbers on their username. For example, take users A-01
and B-01
. If we denote groups by A and B, then it’s easy to extract the two-digit number using a regular expression and take it as a seed for a random number generator.
The type of experiments that can be implemented in this way is still fairly limited. For example, I have not tried to program an experiment where different users interact, e.g. to run experiments that inspect how behavior changes within a group, or how certain types of communication affect outcomes.
But given there are examples of chat rooms being implemented using shiny, I see a chance this can be done.
Even though the length of the post and my messy code can look daunting and give the impression that this is not much easier than becoming a web programmer, it certainly is. Shiny handles a lot of server structure and initialization of databases, sessions and much more beneath the hood. The simple experiment I implemented above is more of a proof of concept and you may wish to expand it in many ways. I certainly have shown only a very narrow set of things that are possible within shiny.
Thus, in case you just got started running experiments and want to have more flexibility than zTree and not to hassle about the gory details, shiny might be an alternative for you.
glm
function in R comprises several interesting members that are standard tools in machine learning and data science, e.g. the logistic regression model.
Looking at the scatter plots shown in Markus’ post reminded me of the amazing talk by Micheal Betancourt (there are actually two videos, but GPs only appear in the second – make sure you watch them both!). In one of the examples, he uses a Gaussian process with logistic link function to model data on the acceptance ratio of gay marriage as a function of age. The results he presented were quite remarkable and I thought that applying the methodology to Markus’ ice cream data set, was a great opportunity to learn what a Gaussian process regression is and how to implement it in Stan. I initially planned not to spend too much time with the theoretical background, but get to meat and potatoes quickly, i.e. try them in practice on a data set, see how they work, make some plots etc. But all introductory texts that I found were either (a) very mathy, or (b) superficial and ad hoc in their motivation. I wasn’t satisfied and had the feeling that GP remained a black box to me. Maybe you had the same impression and now landed on this site?
I was therefore very happy to find this outstanding introduction by David MacKay (DM). I think it is just perfect – a meticulously prepared lecture by someone who is passionate about teaching.
This provided me with just the right amount of intuition and theoretical backdrop to get to grip with GPs and explore their properties in R and Stan.
In this post I will follow DM’s game plan and reproduce some of his examples which provided me with a good intuition what is a Gaussian process regression and using the words of Davic MacKay “Throwing mathematical precision to the winds, a Gaussian process can be defined as a probability distribution on a space of unctions (…)”. In a future post, I will walk through an implementation in Stan, i.e. show how GP regression can be fitted to data and be used for prediction.
Let’ start with a standard definition of a Gaussian process
Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.
That’s a fairly general definition, and moreover it’s not all too clear what such a collection of rv’s has to do with regressions. To draw the connection, let me plot a bivariate Gaussian
library(mixtools) library(mvtnorm) K &lt;- matrix(c(1, 0.9093729, 0.9093729, 1), ncol = 2) set.seed(1906) p &lt;- rmvnorm(1, mean = rep(0, 2), sigma = K) plot(p, type = "n", xlim = c(-5, 5), ylim = c(-5, 5), ylab = expression(y[2]), xlab = expression(y[1]), main = "Fig. 1: Contours of bivariate Gaussian distribution") abline(h = 0, v = 0) points(p, cex = 2, pch = 20) ellipse(mu = rep(0, 2), sigma = K, alpha = 0.05, npoints = 250, col = "darkred") ellipse(mu = rep(0, 2), sigma = K, alpha = 0.2, npoints = 250, col = "red")
p
## [,1] [,2] ## [1,] -1.116858 -1.196803
Where mean and covariance are given in the R code. The point p
has coordinates and . One thing we can glean from the shape of the ellipse is that if is negative, is likely to be negative as well and vice versa. There is positive correlation between the two.
To draw the connection to regression, I plot the point p
in a different coordinate system
plot(c(1, 2), p, xlim = c(0, 3), ylim = c(-2, 0), type = "p", pch = 20, cex = 2, xlab = "x", ylab = "y", main = "Fig. 2: Points in x-y-plane")
where as before, but now the indexes and act as the explanatory/feature variable . The coordinates give us the height of the points for each . And there is really nothing sacred about the numbers and . I could equally well call the coordinates in the first plot and virtually pick any number to index them. This illustrates, that the Gaussian process can be used to define a distribution over a function over the real numbers.
In our simple starting example, I can draw a line to connect the two dots, much as a regression line would do to illustrate this for two dimensions. But you maybe can imagine how I can go to higher dimensional distributions and fill up any of the gaps before, after or between the two points.
The upshot of this is: every point from a set with indexes and from an index set , can be taken to define two points in the plane. And I deliberately wrote and instead of 1 and 2, because the indexes can be arbitrary real numbers. I can continue this simple example and sample more points (let me combine the graphs to save some space here)
par(mfcol = c(1, 2)) N &lt;- 15 set.seed(1906) p &lt;- rmvnorm(N, mean = rep(0, 2), sigma = K) plot(p, type = "n", xlim = c(-5, 5), ylim = c(-5, 5), ylab = expression(y[2]), xlab = expression(y[1]), main = "Fig. 3a: Multiple draws form the Gaussian") abline(h = 0, v = 0) points(p, cex = 2, pch = 20) ellipse(mu = rep(0, 2), sigma = K, alpha = 0.05, npoints = 250, col = "darkred") ellipse(mu = rep(0, 2), sigma = K, alpha = 0.2, npoints = 250, col = "red") plot(rep(1, N), p[, 1], xlim = c(0, 3), ylim = c(-3, 3), type = "p", pch = 20, cex = 2, xlab = "x", ylab = "y", main = "Fig. 3b: Multiple draws in x-y-plane") points(rep(2, N), p[, 2], pch = 20, cex = 2) for (i in 1:N) { segments(x0 = 1, p[i, 1], x1 = 2, p[i, 2], col = i, lwd = 1.5) }
I have also drawn the line segments connecting the samples values from the bivariate Gaussian. This illustrates nicely how a zero-mean Gaussian distribution with a simple covariance matrix can define random linear lines in the right-hand side plot. A multivariate Gaussian is like a probability distribution over (finitely many) values of a function.
For now we only have two points on the right, but by going from the bivariate to the -dimensional normal distribution I can get more points in. And keep in mind, I can also insert points in between – the domain is really dense now, I need not take just some integer values. What I do have to do in order to add more points, is to specify the mean the covariance.
Another instructive view on this is when I introduce measurement errors or noise into the equation
where .
With this my model very much looks like a non-parametric or non-linear regression model with some function .
If the Gaussian distribution that we started with is nothing, but a prior belief about the shape of a function, then we can update this belief in light of the data.
There is a nice way to illustrate how learning from data actually works in this setting. Say, we get to learn the value of . In terms of fig. 3b this means we have to fix the left-hand point at and that any line segment connecting and has to originate from there. And maybe this gets the intuition across that this narrows down the range of values that is likely to take. It seems even more unlikely than before that, e.g.,
We can try to confirm this intuition using the fact that if
is the covariance matrix of the Gaussian, we can deduce (see here)
.
This can be simplified to a normal over
with mean and variance . Inserting the given numbers, you see that and that the conditional variance is around .
One notheworthy feature of the conditional distribution of given and is that it does not make any reference to the functional from of . In that sense it is a non-parametric prediction method, because it does not depend on specifying the function linking to .
Let’s update Figure 3:
par(mfcol = c(1, 2)) set.seed(1906) y1.actual &lt;- 1.1 y2.hat &lt;- 0 + y1.actual * K[1, 2] y2.var &lt;- (1 - K[1, 2]^2) p2 &lt;- data.frame(y1 = rep(y1.actual, N), y2 = rnorm(N, mean = y2.hat, sd = sqrt(y2.var))) plot(p2, type = "n", xlim = c(-5, 5), ylim = c(-5, 5), ylab = expression(y[2]), xlab = expression(y[1]), main = "Fig. 3a: Multiple draws from the two-dimensional Gaussian") abline(h = 0, v = 0) abline(v = y1.actual, lty = 2, col = "grey", lwd = 1.5) points(p, cex = 2, pch = 20, col = "lightgrey") points(p2, cex = 2, pch = 20) ellipse(mu = rep(0, 2), sigma = K, alpha = 0.05, npoints = 250, col = "darkred") ellipse(mu = rep(0, 2), sigma = K, alpha = 0.2, npoints = 250, col = "red") plot(rep(1, N), p2[, 1], xlim = c(0, 3), ylim = c(-3, 3), type = "n", pch = 20, cex = 2, xlab = "x", ylab = "y", main = "Fig. 3b: Multiple draws in x-y-plane") points(rep(1, N), p[, 1], pch = 20, cex = 2, col = "lightgrey") points(rep(2, N), p[, 2], pch = 20, cex = 2, col = "lightgrey") for (i in 1:N) { segments(x0 = 1, p[i, 1], x1 = 2, p[i, 2], col = "lightgrey", lwd = 1.5, lty = 2) } for (i in 1:N) { segments(x0 = 1, p2[i, 1], x1 = 2, p2[i, 2], col = i, lwd = 1.5) } points(1, y1.actual, pch = 20, cex = 2) points(rep(2, N), p2[, 2], pch = 20, cex = 2)
The conditional distribution is considerably more pointed and the right-hand side plot shows how this helps to narrow down the likely values of .
The upshot here is: there is a straightforward way to update the a priori GP to obtain simple expressions for the predictive distribution of points not in our training sample.
The connection to non-linear regression becomes more apparent, if we move from a bivariate Gaussian to a higher dimensional distrbution. With more than two dimensions, I cannot draw the underlying contours of the Gaussian anymore, but I can continue to plot the result in the plane.
p &lt;- rmvnorm(N, mean = rep(0, n), sigma = K) plot(1, type = "n", xlim = c(0, 7), ylim = c(-3, 5), xlab = "x", ylab = "y") abline(h = 0, v = 0) for (i in 1:n) { points(rep(i, N), p[, i], pch = 20, cex = 2) } for (i in 1:N) { for (j in 1:(n - 1)) { segments(x0 = j, p[i, j], x1 = j + 1, p[i, j + 1], col = i, lwd = 1.5) } }
where again the mean of the Gaussian is zero and now the covariance matrix is
K
## [,1] [,2] [,3] [,4] [,5] [,6] ## [1,] 1.00000000 0.9093729 0.6838614 0.4252832 0.2187119 0.09301449 ## [2,] 0.90937293 1.0000000 0.9093729 0.6838614 0.4252832 0.21871189 ## [3,] 0.68386141 0.9093729 1.0000000 0.9093729 0.6838614 0.42528319 ## [4,] 0.42528319 0.6838614 0.9093729 1.0000000 0.9093729 0.68386141 ## [5,] 0.21871189 0.4252832 0.6838614 0.9093729 1.0000000 0.90937293 ## [6,] 0.09301449 0.2187119 0.4252832 0.6838614 0.9093729 1.00000000
Having added more points confirms our intuition that a Gaussian process is like a probability distribution over functions. It also seems that if we would add more and more points, the lines would become smoother and smoother.
And in fact for the most common specification of Gaussian processes this will be the case, i.e. the GP prior will imply a smooth function.
Drawing more points into the plots was easy for me, because I had the mean and the covariance matrix given, but how exactly did I choose them? Randomly? I will give you the details below, but it should be clear that when we want to define a Gaussian process over an arbitrary (but finite) number of points, we need to provide some systematic way that gives us a covariance matrix and the vector of means. The former is usually denoted as for any two (feature) vectors and in the domain of the function. The latter is usually denoted as and set to zero. With this one usually writes
.
With set to zero, the entire shape or dynamics of the process are governed by the covariance matrix. If you look back at the last plot, you might notice that the covariance matrix I set to generate points from the six-dimensional Gaussian seems to imply a particular pattern. For paths of the process that start above the horizontal line (with a positive value), the subsequent values are lower. The other way around for paths that start below the horizontal line. Like in the two-dimensional example that we started with, the larger covariance matrix seems to imply negative autocorrelation.
Hence, we see one way we can model our prior belief. If we had a formula that returns covariance matrices that generate this pattern, we were able postulate a prior belief for an arbitrary (finite) dimension.
The formula I used to generate the $ij$th element of the covariance matrix of the process was
More generally, one may write this covariance function in terms of hyperparameters
Because is a function of the squared Euclidean distance between and , it captures the idea of diminishing correlation between distant points. How fast the exponential term tends towards unity is goverened by the hyperparameter which is called lenght scale. For large the term inside the exponential will be very close to zero and thus the kernel will be constant over large parts of the domain. The hyperparameter scales the overall variances and covariances and allows for an offset.
The squared exponential kernel is apparently the most common function form for the covariance function in applied work, but it may still seem like a very ad hoc assumption about the covariance structure. It turns out, however, that the squared exponential kernel can be derived from a linear model of basis functions of (see section 3.1 here). In general, one is free to specify any function that returns a positive definite matrix for all possible and . Discussing the wide array of possible kernels is certainly beyond the scope of this post and I therefore happily refer any reader to the introductory text by David MacKay (see previous link) and the textbook by Rasmussen and Williams who have an entire chapter on covariance functions and their properties.
Now that I have a rough idea of what is a Gaussian process regression and how it can be used to do nonlinear regression, the question is how to make them operational. In terms of the Bayesian paradigm, we would like to learn what are likely values for , and in light of data. Then we can determine the mode of this posterior (MAP). Likewise, one may specify a likelhood function and use hill-climbing algorithms to find the ML estimates. It is not too hard to imagine that for real-world problems this can be delicate. Especially if we include more than only one feature vector, the likelihood is often not unimodal and all sort of restrictions on the parameters need to be imposed to guarantee the result is a covariance function that always returns positive definite matrices.
Fitting a GP to data will be the topic of the next post on Gaussian processes.
I write my blog entries in RStudio using Rmarkdown and then push them to the WordPress servers. And the way I implemented this is much along the lines of what is described here.
Hence, first install the RWordPress
package
if (!require(RWordPress)) { install.packages("RWordPress", repos = "http://www.omegahat.org/R") }
and link it to your account (insert your username for ‘yourWPusername’ and your password for ‘yourWPpassword’)
options(WordPressLogin = c(yourWPusername = "yourWPpassword"), WordPressURL = "https://yourWPusername.wordpress.com/xmlrpc.php")
You can do a quick whether your credentials are working by calling
getUsersBlogs()
So far so good.
When you blog about R, I guess its reasonable to expect to produce many graphs to show on your blog. I handle graphs following Yihui Xie’s suggestion and save them in my dropbox. I created a separate folder in my Public
folder called wp
that is a bucket for all the figures that I create. Then I tell knitr
that I want to save any figure created in this folder.
opts_knit$set(base.url = 'https://dl.dropboxusercontent.com/u/some_number/wp/', base.dir = 'C:/<path to dropbox>/Dropbox/Public/wp/')
The first line includes the URL of the wp
folder which you can find by putting a little file in it and right-clicking on it to copy the link. Then insert the copied link and the path to your newly created folder into the snippet above.
Suppose your have your blog post saved in a file post.Rmd
in the Rmarkdown .Rmd
format. Using the function knit2html
you can turn it into an HTML file (in RStudio simply press the the ‘Knit HTML’ button at the top). To push the content of the HTML file to WordPress one has to extract the body of the HTML file, otherwise WordPress.com won’t accept it (at least when you are hosting your blog with WordPress.com – which I do).
On the blog I linked above, the author provides a function to sanitize the HTML file and return a string that you can pass on to WordPress using the RWordPress
function newPost
. But for me it didn’t work, because it inserted crazy special characters here and there. After googling a bit, I found an alternative solution (I forgot where, feel free to point me to the original and I will happily link to it):
knit2wp <- function(file) { require(XML) content <- readLines(file) content <- htmlTreeParse(content, trim=FALSE) ## WP will add the h1 header later based on the title, so delete here content$children$html$children$body$children$h1 <- NULL content <- paste(capture.output(print(content$children$html$children$body, indent=FALSE, tagSeparator="")), collapse="\n") content <- gsub("<?.body>", "", content) # remove body tag ## enclose code snippets in SyntaxHighlighter format content <- gsub("<?pre><code class=\"r\">", "\\\\\n", content) content <- gsub("<?pre><code class=\"no-highlight\">", "\\\\\n", content) content <- gsub("<?pre><code>", "\\\\\n", content) content <- gsub("<?/code></pre>", "\\[/code\\]\\\n", content) return(content) }
The way it now works for me is as follows. Suppose I have my post written in Rmarkdown and saved in the file test.Rmd
. Set the current working directory to the location of this file and then run the following script:
library(RWordPress) options(WordPressLogin = c(yourWPusername = "yourWPpassword"), WordPressURL = "https://yourWPusername.wordpress.com/xmlrpc.php") library(knitr) opts_knit$set(base.url = 'https://dl.dropboxusercontent.com/u/some_number/wp/', base.dir = 'C:/<path to dropbox>/Dropbox/Public/wp/') knit2wp <- function(file) { require(XML) content <- readLines(file) content <- htmlTreeParse(content, trim=FALSE) ## WP will add the h1 header later based on the title, so delete here content$children$html$children$body$children$h1 <- NULL content <- paste(capture.output(print(content$children$html$children$body, indent=FALSE, tagSeparator="")), collapse="\n") content <- gsub("<?.body>", "", content) # remove body tag ## enclose code snippets in SyntaxHighlighter format content <- gsub("<?pre><code class=\"r\">", "\\\\\n", content) content <- gsub("<?pre><code class=\"no-highlight\">", "\\\\\n", content) content <- gsub("<?pre><code>", "\\\\\n", content) content <- gsub("<?/code></pre>", "\\[/code\\]\\\n", content) return(content) } knit2html("test.Rmd") newPost( list( description=knit2wp('test.html'), title='Test', categories=c('Programming', 'R'), mt_keywords=c('rstats', 'blogging', 'XML-RPC', 'R', 'knitr', 'markdown') ), publish=FALSE)
Et voilá!