Posted in Uncategorized

Great British Bake Off

After much discussion in class, the Great British Bake Off has come to a conclusion.  In less than 4 minutes after the announcement, twitter managed to have over 5,000 tweets with #GBBO.

This gave me another opportunity to use wordclouds to visualise the resultant tweets.  A basic wordcloud – using words with a frequency of at least 15 is the first step (obviously removing GBBO, because they all contain that!)

Tweets between 21:00 and 21:04 (BST) with #GBBO in the text
Tweets between 21:00 and 21:04 (BST) with #GBBO in the text

The next step was to try to look at sentiment analysis….

#GBBO tweets: coloured by the most frequent sentiment underlying the entire tweet.
#GBBO tweets: coloured by the most frequent sentiment underlying the entire tweet.

A wonderful part of the twitter feed in the immediate aftermath was the overwhelming sense of joy – as captured by this very quick snapshot!

Why is Nadiya now noticeably bigger than final?  The second plot only uses tweets where the sentiment behind them can be determined.  So all those very short tweets that really didn’t contain much information within the text of the tweets were excluded from the final image.  The “names” of the sentiments are somewhat coarse: anger / fear may be also considered as frustration in some instances.

I’m sure that I could do a better job… but this was one piece of analysis where time is of the essence! If the bakers are under a three hour deadline, then so should I be.

Posted in Uncategorized

Rates and Replication Issues

More travelling for me… and during those travels, two stories struck me as worthy of comment. These were widely reported, but I’m using links from the Guardian for both stories.

Rates:

The revelation that “Thousands have died after being found fit for work”

http://www.theguardian.com/society/2015/aug/27/thousands-died-after-fit-for-work-assessment-dwp-figures

However, these figures do not give any baseline mortality rates.   This leads to the question of what the baseline should be.  What exactly should we compare these values to?  Some preliminary ideas include:

  1. Compare to the mortality rates of a typical fortnight of those on Employment and Support Allowance (ESA)
  2. Compare to the mortality rates of those on jobseekers’ allowance [probably not a good baseline measure]
  3. Compare to the general population – ignoring disability and employment status, but trying to account for other demographic factors.

The first option would be of immediate relevance.  If the mortality rate of those on ESA is lower than those who have been removed from ESA as they have been declared “fit-for-work” then there is an immediate and obvious major problem.

The second option isn’t really a runner.  Why? Well, although people have been ruled “fit-for-work”, this assessment does not state that they are fit and fully healthy.

The third option is a population baseline – an okay measure but obviously needing terms and conditions!

Not being able to compare these figures to anything meaningful makes them essentially meaningless. We can’t even really assess if they are unexpectedly high or low!  Insufficient detail on the causes of death [related to their disability, to the assessment process, accident etc.?] is another issue that would need to be examined before this story could be properly be deemed a fully fleshed out story.

Replication:

The story of the attempt to replicate results from 100 major psychology studies (where just over 1/3 of studies could be replicated) is welcome in that it stirs debate about the direction of science.

http://www.theguardian.com/science/2015/aug/27/study-delivers-bleak-verdict-on-validity-of-psychology-experiment-results

We would like, as a community, to be able to claim that science progresses in leaps and bounds. It is more typically a long hard slog; making tiny incremental progress.  The “publish or perish” culture is not healthy for long term scientific thinking.  Unless a replication study sets out to confirm the results of many other studies [such as the one quoted in this story http://www.sciencemag.org/content/349/6251/aac4716 ] then checking other peoples results carefully can be seen as a waste of resources; both time and money.  Editors deem it not be “innovative” and discount it off-hand; unless it is contradicting the findings of a major study, and thus has some controversy attached.

However, it is only by checking the results that we can hope to strengthen the foundations of future research.  Who should provide funding for the “boring” (not innovative) process of replicating others results before the community can assume the results to hold?  Furthermore, what is the point of doing these replication studies without others knowing that the study has been undertaken and the findings found to either support or undermine the original results?  There is definitely a gap for a good open access repository for the results of replication studies.  Without spending a large amount of time on the write up – a simple experimental design, noting any adaptations / amendments to the original study, with the reasons for those designs, and the results and a headline of whether the original study was confirmed or undermined would provide a useful tool for researchers.  If this could then be extended so that online versions of the original study would then have to link to the relevant replication studies within the repository I would be ecstatic.

With publication bias still a major issue; effect sizes and statistical significance of results in the published literature should be taken with a pinch of salt.   But we shouldn’t have to replicate each study before relying on the findings.  But who is going to properly lead the charge to ensure that replication studies are undertaken and then that the subsequent findings are accessible in order to avoid wasted effort?

Posted in Uncategorized

A Statistician on Holiday

I had the privilege of going on holidays to the European Juggling Convention in the beautiful setting of Bruneck / Brunico [yes, a town with two names] in Northern Italy.  This was my 11th EJC in a row [starting with Ptuj, Slovenia in 2005].

Earlier this year, I had set my students a piece of coursework modelling the arrival patterns of people attending this festival.  While their predictions about overall numbers were accurate pretty much everything else was wrong. I had somewhat simplified the problem that I gave them to investigate.

What factors should have been included in the modelling process?

Cost of ticket – especially the difference between the pre-registration price and the cost of paying on the door.  In recent years the difference between these prices has been deliberately increased by organising teams and it is very difficult to plan for those who arrive without registering in advance.  How does a team know how many shows, showers, camping space and toilets are needed for everyone to enjoy the week in comfort unless they have the numbers in advance?

With a greater proportion of people pre-registering, more arrive during the first weekend to get full value out of their ticket and fewer arrive during the night as the convention progresses, meaning that the registration desks can be completely closed down for a good six hours every night for the majority of the week. This was another feature that some, but not all, students picked up on as being a trend.

Ease of registration system – this year Andi did a wonderful job in creating a new system that made onsite processing extremely quickly.  This meant that historical information on how long it took to process each arrival was really inaccurate and predictions about the number of desks needed and whether there was a persistent queue were really inaccurate.  However, the prediction that nine desks would be an appropriate number for the first few hours was correct!  The persistent queue had vanished by 14:30 on the first Saturday because of Andi’s system.  My first real observed practical use for QR codes.  We ended up being able to fully process a person in an average of about 90 seconds.

(the system in action!)

The data that I gave students was fictional, but plausible data, simulated to give everyone different data to work with having the same underlying structure for consistency.  It gave them a chance to examine a real problem, with relevance that they could see for event planning.  Incomplete information is an issue that we constantly arrive with in real life, so it is interesting to see the direct consequences of missing information in action, especially when examining the evolution of behaviour over a number of years.

Posted in Uncategorized

May you live in interesting times

It’s been a very busy month for Greece.  Not in a particularly good way.

Inspired by Oliver Marsh’s talk (@SidewaysScience) at the Science in Public 2015 Conference, I pulled data from twitter feeds relating to Greece.  After much trial and error, and discovering that you can only easily access data from about the last 9 days via the API (which I accessed directly through R).

To visualise this complex data, I have used wordclouds, coloured by emotions as classified by the sentiment package in R.

I picked four days:

  1. Tuesday June 30th: on this day, Greece missed an payment to the IMF.  Banks had been closed on June 29th, after the referendum was announced and approved by the Greek parliament over the previous weekend. Capital controls are in place, with withdrawal limits of €60 in place.
  2. Monday July 6th: the immediate aftermath of the referendum that returned a decisive “OXI” (no) vote
  3. Monday July 13th: a deal is struck in Europe; the conditions harsher than those rejected in the referendum vote.
  4. Thursday July 16th: The Greek Parliament begins to approve measures as required by creditors; some violent protests in Greece that evening.

Tuesday June 30th:

June 30 twitter

Surprisingly, Angela Merkel and François Hollande actually featured more prominently than any Greek politicians on twitter on June 30th!  This may be because their names are easier to spell, thus making spellings more consistent.  This is one problem with standard text analysis: spelling mistakes are not accounted for unless a lot of prior data cleaning is undertaken.

Monday 6th July

Twitter July 6

The immediate reaction on twitter to the referendum results are a combination of surprise, disgust and fear.  In the tweets in English, at the very least, did not contain much joy / jubliation after the No vote.  Even at the time, it seems as those on twitter were very aware that a no vote wasn’t really going to help the Greek cause.

Monday 13th July

Twitter July 13

This day marked the realisation of the capitulation of the Greek government to the demands of the creditors.  The third week of capital control in Greece continued.

Thursday 16th July

Twitter July 16

The disgust both relating to the violent protests and towards the imposition, against the public vote, of the harsh conditions attached to the bailout, are evident in the twitter stream [over 90,000 tweets] on July 16th.  The names of Greek politicians are more prominent here than on June 30th.

Posted in Uncategorized

Thoughts on Being a Judge

Well, the results have been announced so I am now somewhat at liberty to discuss my thoughts on the submissions to the RSS Excellence in Statistical Journalism awards [http://www.statslife.org.uk/news/2324-rss-journalism-awards-2015-winners-announced].  There were some very worthy winners!

The Good:

Some of the submissions were excellent; well-pitched to their target audience, of appropriate length and detail and could become really good teaching examples. A slight down-side to this was the difficulty in placing submissions in the most appropriate categories: one of the lessons learned from the judging point of view is to be stricter on having those submitting the pieces to select one and only one category. One piece that I really enjoyed, but that didn’t earn a prize, was from More or Less. It faced tough competition in its category, and was slightly hampered by not exactly “fitting” the categories.  Also, examples of really good practice was the provision of supporting material: background to how the data was sourced and the terms and conditions applied to the data.

The Bad:

Pieces that were much too long. Although the rules stated that longer pieces could be submitted by agreement, submitting something more than ten times the standard limit that could not be reasonably be expected to engage a typical reader. Why were these bad? In general they tried to tell too many stories rather than just a concentrating on telling the story of a single issue well.

The Ugly:

Graphs! Seriously there were some really bad graphs. Doughnut plots were especially poor – even worse than pie charts. The poor use of graphs was particularly disappointing given that many of the submissions were working with graphics departments. In other graphs, the kitchen sink was thrown into the process. They could do well with reading Tufte’s “Principles of Graphical Excellence”. While I wouldn’t recommend them to follow the recommendations to the letter, understanding that simplicity and clarity of graphs is important.

And Finally:

The judging process itself was quite efficient; with judges ranking submissions in individual categories. Then, before the main judging day, those who were involved in the final judging process were then sent all the pieces that had been shortlisted. This allowed for very quick decisions to be made in some categories, and time for interesting discussions on the pieces in other cases.

Posted in Uncategorized

Athena SWAN ceremony

Today was spent at the Athena SWAN awards ceremony for institutions and university departments in the University of Greenwich. I’ve been on the departmental Self-Assessment Team since the beginning, so was one of those selected to represent our department.

The general consensus is that the process needs to be made more consistent; especially if people are to be recruited and retained on the judging panels. One idea that found favour amongst those that I talked to was a standardisation of the data presentation.

Essentially, each applicant team would enter the data into a pre-designed spreadsheet, by selecting the appropriate groupings of academics to be compared to for the benchmarking process. Every application team is having to track down benchmarking data – surely it must be a more efficient process to have this centralised and provided to applicants: otherwise, for completeness of reviewing, the benchmarking data used for each application must also be verified.

If the benchmarking data was provided, with space for the applicant teams to insert their data, this could be used to generate graphs and tables that are consistent, allowing the reviewing panels to compare different submissions at a glance.

Teams could then concentrate their focus on creating useful action plans, of monitoring progress and of creating much tighter and shorter applications that refers to the tables / figures generated automatically. This would, in turn, encourage more people to put themselves forward as panel members. I know, that, as it stands, I could think of very few things that I would less like to spend my time doing than wading through a pile of submissions.

In the midst of all the self-congratulating, there was some discussion of the Tim Hunt affair. To me, thus far, there has been a lack of engagement with the issues raised by his “joke” by those running Athena SWAN. The question of whether women require more emotional support and encouragement in research settings – especially in those labs where they are in the minority – is being looked at by many involved in the Athena SWAN process. Issues of confidence rather than competence is a major problem that needs to be addressed when we are trying to widen participation in STEMM [Science, Technology, Engineering, Mathematics and Medicine] subjects. If Hunt had focused his remarks on this – and questioning why this is the case – he would have been applauded.

The response [especially the twitter hashtag #distractinglysexy] has been pretty good humoured. Despite some people saying that there has been a witch-hunt, my impression is that the majority of those responding to the furore have been playing the ball rather than the man.

The reaction has also allowed us to note that comments such as Hunt’s support the reasons for Athena SWAN existing: these dinosaur-like opinions are still out there, especially amongst the older, more senior often quite influential, academics. It gives us a prime chance to motivate proposed changes and to show how they can be advantageous to (almost) everyone.

Posted in Uncategorized

The aftermath

So, given the results of the election, a few things are up for discussion:

  • Polls got it wrong – why?
  • There seems to be an interesting relationship between turnout in each constituency and which party won.
  • Lib Dem collapsed into Tories
  • UKIP failed at the voter concentration problem required for FPTP system.

I’m currently typing in all the data from the GB constituencies, as I haven’t been able to find it in a decently machine readable format as of yet – I don’t have access to the Press Association feeds to get a clear version.  Of all the above, at this stage, it looks as if the turnout will be most (personally) interesting; also it hasn’t really been discussed in the same length as the other points.

There goes my weekend.

Posted in Uncategorized

Election Preparation

In anticipation of some interesting results, I’ve been accumulating data on a constituency basis to be ready to do some analysis of the results of the UK general election on the 7th May.   I’m also playing with the current maps – how best to cope with shifting electoral boundaries.

I’ve deliberately avoided commenting on the polls here – as I would prefer that this remain as apolitical as possible. My thoughts on current polls are they are generally too broad – not capturing the local tactical voting that happens with first past the post systems. Lord Ashcroft’s polls have been interesting as they have been done on a constituency basis; the great unknown is how much these polls may influence tactical voting within marginal seats.

The data has been tidied up sufficiently to produce turnout maps from the previous general election – the purpose of this was really to set up all the matching rules for how different organisations name constituencies.  It’s a surprising amount of faff – why couldn’t everyone work with the ONS codes (or at least include them in the datasets for easy matching!)?

GBturnout

I’ve also worked out the niceties of “zooming in” for dense areas that are difficult to see on a national map with London used as an example.

LondonTurnout

So I think that I’m ready to go for analysis of the election results next week! Results should be interesting.

Posted in Uncategorized

Gone quiet… reasons and excuses

So I’ve been rather lax with my posting schedule recently.  A combination of three things conspired against me writing anything sensible here.

  1. I’ve been contributing towards another piece for Significance – reviewing the methodologies of different polling companies in Great Britain (as most don’t bother with Northern Ireland) in advance of the General Election.
  2. Teaching – my final year class in applied statistical modelling had a major piece of coursework (modelling the number of arrivals per hour at a festival and using that information to inform the organisers about optimal arrangments for opening hours etc) meant that I was providing a lot of additional time to my students.  I enjoy this part of my teaching as it really allows for students to develop beyond the scope of what can be taught in large groups, but it is extremely time consuming.
  3. Computer issues. I’ve had some pretty major computer issues recently; affecting both my personal laptop and my work machine.  Both had to be replaced (my work machine did give me the lovely warning of “imminent hard drive failure” just in time.  These caused numerous delays over the last month, so something had to give in my efforts to catch up.

Hopefully things will start to run a bit more smoothly (regression towards the mean would imply that this should be a reasonable assumption!) and I will be able to devote a bit more time to writing about the interesting things stats related that are sure to be popping up with a bit more frequency in the run-up to the election.

Posted in Uncategorized

Ranking educational institutions

Why rankings of institutions are not a great idea…

It is that time of the year again – when we hear about the rankings given to schools in the UK (which was discussed on Radio 4 Today’s programme this morning). While much time was given to the fact that they have “moved the goalposts” by changing the criteria on which the rankings are assessed, not much time was given to the old chestnut that the idea of ranking schools is just not a good idea as the ranks are too unstable to be meaningful.

Another ranking that is topical (in Higher Education circles at least) is those that will arise from the National Student Survey; BSc Mathematics in general have quite high satisfaction scores and relatively small class sizes, so relative rankings can change quite dramatically year on year with only random variation from the true underlying “satisfaction rating” of a degree programme.

I decided to simplify the problem and to simulate it to illustrate the major problem.

I took 16 different percentages (or the associated probabilities of success / satisfaction) – from 80 – 95%. For each of these values I modelled 5 sets of data, each representing 15 years of data, with the potential sample sizes (number of respondents) varying at random between 45 and 55. I then modelled the number of successful (or satisfied) students and hence the proportions – which depend on the corresponding sample sizes. These proportions were then turned into annual ranks – so in each of the 15 “years” an institution would have been ranked between 1 and 80 (with, for clarity, the lower ranks indicating better performance).

The graph illustrates the problem. The black dots indicate the median rank across the 15 “years” of data. The red dot represents the “true” ranking – located in the centre of each of the groups of institutions with tied true rankings. The grey dashed lines in the background show the span of values of the rankings given year-on-year.  Many of these dashed lines span almost the entire range of possible rankings!

Variation in Rankings
Variation in Rankings

Moral of the story is, when comparing many institutions with very similar performances, ranks are meaningless in practice. If anything needs to be compared, compare the raw values (%) so that people can see how little difference there exists in practice – as you can see below, by treating each institution in isolation, the random variations over time become still exists but people can judge for themselves as to whether any observed difference is large enough to cause real concern (a jump of 3 percentage points may have a dramatic difference in the ranking position when everyone is tightly bunched together).

Treating each institution in isolation
Treating each institution in isolation

I chose the range 45-55 as it is a common range of the number of responses in the National Student Survey for BSc Mathematics degrees.  Looking at Bristol data for Key Stage 2, and the 29 schools with a minimum of 85% achieving level 4 or above in reading, writing and maths at KS2, the numbers of eligible students in each of these schools range from 20 to 91, with a median of 45 students [so the same range is sensible].