Geographically Weighted Regression

A brief primer on GWR...

There are, perhaps, thousands of examples of the use of multiple regression modelling in geographical enquiry. Typically these will involve estimating the relationship between one variable and a set of predictor variables for a collection of geographical entities (often a set of points, or zones). As an illustration, we might have a model with two predictor variables:

y = b0 + b1x 1 + b2x2 + e

where y is the dependent variable, x1 and x2 are the independent variables, b0, b1 and b2, are the parameters to be estimated, and e is a random error term, assumed to be normally distributed. A basic assumption in fitting such a model is that the observations are independent of one another. With much geographical data, this is unlikely to be the case. A second assumption that we make is that the structure of the model remains constant over the study area, in other words, there are no local variations in the parameter estimates.

GWR permits the parameter estimates to vary locally; we can rewrite the model in a slightly different form:


y(g) = b0(g) + b1(g)x1 + b2(g)x2 + e

where (g) indicates that the parameters are to be estimated at a location whose coordinates are given by the vector g.

How do we estimate the parameters for such a model?

Using OLS, the parameters for a linear regression model can be obtained by solving:


b = (XTX)-1XTY


The parameter estimates for GWR may be solved using a weighting scheme:

b(g) = (XTW(g)X)-1XTW(g)Y

The weights are chosen such that those observations near the point in space where the parameter estimates are desired have more influence on the result than observations further away. Two functions we have used for the weight calculation have been (a) bi-square and (b) Gaussian. In the case of the Gaussian scheme, the weight for the ith observation is:

wi(g) = exp(-d/h)2

where d is the Euclidean distance between the location of observation i and location g, and h is a quantity known as the bandwidth. (There are similarities between GWR and kernel regression). One characteristic that is not immediately obvious, is that the locations at which parameters are estimated need not be the ones at which the data have been collected.


The resulting parameter estimates may be mapped in order to examine local variations in the parameter estimates. One might also map the standard errors of the parameters estimates as well. Hypothesis tests are possible - for example one might wish to test whether or not the variations in the values of a parameter in the study area are due to chance.

The bandwidth may be either supplied by the user, or estimated using a technique such as crossvalidation.