Posted in teaching, Uncategorized

Student Projects: Spatial Statistics

At this time of the year, I once again start to think about how to create interesting, but feasible, projects for final year students.  Many times I find students have their own particular set of interests and I will try to work through a process with them to develop project ideas that will maintain their interest for an academic year.

Recently, I have been primarily focusing on projects with a spatial element, for a number of reasons.

  1. Goes beyond what they are taught in an particular module on their degree programme
  2. Lots of public/government data available have a spatial element
  3. Encourages students to use R rather than SPSS/Minitab (the other statistics packages that we teach our students)
  4. Looks good on a CV as it is unusual to see analysis and modelling of spatial data at an undergraduate level.

I mainly recommend a single textbook to students; Applied Spatial Data Analysis with R by Bivand R.S., Pebesma E. and Gómez-Rubio V. This is a great book for those learning spatial statistics.

As we mainly use Generalised Additive Models when analysing the data, the framework that I use for explaining the concepts tend to be:

  • (Multiple) Linear Regression: response variable continuous, explanatory variable(s) continuous

E[y|x]=\beta_0+\beta_{1}x_{1}+\cdots+\beta_{p}x_{p}

  • General Linear Models: response variable continuous, explanatory variable(s) may be categorical or continuous
  • Additive Models: response variable continuous, model uses functions of explanatory variables

E[y|x]=\beta_0 + f_{1}(x_{1})+\cdots+f_{p}(x_{p})

  • Generalised Linear Models: response variable not necessarily continuous (could be binomial or poisson), explanatory variable(s) may be categorical or continuous

g\left(E[y|x]\right)=\beta_0+\beta_{1}x_{1}+\cdots+\beta_{p}x_{p}

  • Generalised Additive Models: response variable not necessarily continuous (similar to Generalised Linear Models), model (may) use functions of (some of) the explanatory variables.

g\left(E[y|x]\right)=\beta_0 + f_{1}(x_{1})+\cdots+f_{p}(x_{p})

This talk gives a very quick overview of GLM / GAM.

This year, I have students looking at the US Primary election results on a county-by-county level (principally examining the within-party rather than the between-party distribution of votes) and also looking at cancer rates around Europe.  Previous projects have looked at more economic data with a spatial element.. but perhaps the future will involve more environmental applications.