Beyond passive consumers: engaging students more deeply with R packages

I have been thinking about how to engage students with R packages that may, at first, seem to make life incredibly easy for them, but which could make it more difficult for us to find assessment offenses.

The easystats family of packages is one such case.

Getting them to think about additional features is one such case. For example, in the report package having an option for variable names that are more nicely formatted than how they appear in the dataset would allow for the automated text output to be a lot more elegant and publication ready – even just using the classic iris dataset as an example.

For example: see the following example, as generated using quarto.

The ANOVA (formula: Sepal.Length ~ Species) suggests that:

The main effect of Species is statistically significant and large (F(2, 147) = 119.26, p < .001; Eta2 = 0.62, 95% CI [0.54, 1.00])

Effect sizes were labelled following Field’s (2013) recommendations.

using the lm command

model <- lm(Sepal.Length ~ Species, data = iris)
report.output<-report(model)

The automated reporting (stored in report.output) is as follows:

We fitted a linear model (estimated using OLS) to predict Sepal.Length with Species (formula: Sepal.Length ~ Species). The model explains a statistically significant and substantial proportion of variance (R2 = 0.62, F(2, 147) = 119.26, p < .001, adj. R2 = 0.61). The model’s intercept, corresponding to Species = setosa, is at 5.01 (95% CI [4.86, 5.15], t(147) = 68.76, p < .001). Within this model:

The effect of Species [versicolor] is statistically significant and positive (beta = 0.93, 95% CI [0.73, 1.13], t(147) = 9.03, p < .001; Std. beta = 1.12, 95% CI [0.88, 1.37])
The effect of Species [virginica] is statistically significant and positive (beta = 1.58, 95% CI [1.38, 1.79], t(147) = 15.37, p < .001; Std. beta = 1.91, 95% CI [1.66, 2.16])

Standardized parameters were obtained by fitting the model on a standardized version of the dataset. 95% Confidence Intervals (CIs) and p-values were computed using a Wald t-distribution approximation.

Even the option to be able to replace “Sepal.Length” with “Sepal Length” would really improve appearance of the resulting statements, as would being able to save the actual reference used [which was Andy Field’s 2013 textbook Discovering Statistics using IBM SPSS Statistics].

This would allow students to engage more deeply with package contents rather than be passive users. Perhaps an assessment in the form of a portfolio of suggested ideas (and implementations if possible) of features would be an interesting platform.

Even better, would be using peer assessment to help determine which of the suggested ideas would potentially be the most useful at their level of analysis.