Monthly Archives: October 2015

Administrative data is a bit like Tinder: Other people seem to be using it, are you missing out?

Vernon Gayle, Roxanne Connelly, Chris Playford

University of Edinburgh

There is a buzz around administrative data, many people seem to be using it, are you missing out?

Administrative data are records and information that are gathered in order to organise, manage or deliver a service. Although they are not primarily collected for research some administrative data resources contain information on individuals that have great potential for sociological research. These datasets are best described as ‘administrative social science datasets’.

It is becoming increasingly common for administrative data to be linked to existing large-scale social survey datasets, but historically social scientists have only had highly restricted access to administrative records. The ESRC have funded the Administrative Data Research Network (ADRN) which aims to appropriately open up access to a plethora of data that have been locked away in databases and files. The goal is to provide researchers with access to data from Government Departments and other agencies that routinely collect data relevant to social research.

The new ADRN will allow researchers to gain carefully supervised access to data to undertake studies that are ethical and feasible. A critical feature of administrative social science data is that people cannot be identified and data cannot be linked back to individuals. This ensures that nobody’s privacy is infringed. The bar for gaining access to administrative social science data is set high because a great deal of work is required to link data and to get de-identified data ready for researchers to analyse. The outcome however will be unparalleled new sources of social science data suitable for sociological research. These data will support detailed empirical analyses of social and economic life in contemporary Britain.

Examples of plausible sociological analyses that could be undertaken with administrative social science data are legion, but we list a few to illustrate the diversity of administrative social science data sources, and to prime the reader’s sociological imagination.

Better understanding intergenerational income mobility with tax records from parents and their children later in adult life.

  • Investigating flows into and out of child poverty with linked tax and benefits records.
  • Investigating potential relationships between being in care in childhood and criminal careers in adulthood with linked data from social services and the criminal justice system.
  • Exploring the relationship between weather and children’s behaviour with linked meteorological data and school exclusion data.

Currently there is a growing fervour to describe administrative social science datasets as ‘big data’. This is rather unhelpful since the term ‘big data’ has been sloppily deployed to describe data as diverse as outputs from the Large Hadron Collider, scrapped Twitter feeds, Facebook status updates, transactional data from supermarkets, and sensory and tracking information from mobile phones and GPS devices.

There is a risk to forcing administrative social science data under the ‘big data’ umbrella and this should be avoided. There is an emerging idea that ‘big data’ heralds the end of the requirement for comprehensive statistical analyses because the sheer volume of data means that simpler correlations are sufficient. The suggestion is that knowing ‘what’ but not ‘why’ will be more important in the ‘big data’ era. The recent poor predictions from Google Flu Trends are probably the most striking cautionary tale as to why simple correlations should be avoided. We could also point to any one of the humorous spurious correlations that are usually used when teaching sociology undergraduates (our favourite is the correlation between storks and fertility).

The majority of sociological studies using administrative social science datasets will be examining a variable by case matrix that is very similar to those provided by large-scale social surveys. There is absolutely nothing that convinces us that when analysing a variable by case matrix of administrative social science data, we can ignore the helpful lessons that have emerged from decades of sociology, statistics and econometrics. For example if an administrative dataset has repeated measurements on the same individuals the usual problems associated with non-independence of observations or the possibility of residual heterogeneity will not vanish simply because of the size of the dataset, or because the data are from an administrative source rather than a social survey. Anyone who suggests that administrative social science datasets (or ‘big data’ more generally) have special immunity will have to work hard to persuade those of us who are knowledgeable and experienced social science data analysts.

The most notable positive feature of administrative social science datasets is that they are usually large in scale (although they are seldom as large as the ‘big datasets’ that are produced in areas such as particle physics). In many instances they will provide many more cases than existing large social surveys. Because the data are collected for administrative or organisational purposes, measures may be more accurate than if they had been collected from a participant in a study. For example we can envisage that personal income information within a HMRC dataset might be more accurate than information collected in a semi-structured interview.

In some cases administrative datasets will not be a sample but will be an entire population. This is sometimes referred to as n=all. Therefore some of the well-known issues relating to representativeness (including sample selection bias, unit non-response and sample attrition) will be less prevalent.

The positive features of administrative social science datasets are appealing, however like every other source of data relevant to sociological research there are also notable negative features. At a practical level it may simply not be possible to link together some sources of administrative data because there is not a suitable unique identifier available on each dataset to act as a key.

In comparison with large-scale social surveys, especially omnibus surveys such as the Understanding Society (UK Household Longitudinal Survey) which is specifically designed to support multi-disciplinary secondary data analyses, the number of available explanatory variables in most administrative datasets will be extremely limited. For example variables measuring a person’s ethnicity or level of education which are implicated in many sociological analyses might be completely absent in a dataset because they are administratively irrelevant.

Much information in large-scale social surveys is sociologically informed and specialised measures and variables are collected. Such sociologically informed measures might not be available in administrative social science datasets. In some administrative datasets there will be suitable proxy measures, but in other datasets the proxy measures available may be less suitable. For example in an education related dataset the measure of ‘eligibility for free school meals’ might seem like a suitable proxy for household disadvantage. ‘Eligibility for free school meals’ will perform relatively poorly however when compared with a sociologically informed measure such as the National Statistics Socio-Economic Classification in an analysis of school examination performance.

Much administrative data will naturally be high quality in terms of both validity and reliability. Sociologists have a long track record of being reflexive and concerned about the research value of data. Thought must similarly be placed into the quality of measures within administrative datasets. The inaccuracies that individuals detect and the errors that emerge in transactions with the benefits agencies, tax authorities, transport agencies, the national health service and local authorities, coupled with the errors, miscalculations and inaccuracies that occur in transactions with service providers such as banks, credit cards, utility companies, and delivery and transport providers, all hang a reasonable question mark over the quality of some administrative data for sociological research.

In some instances the administrative datasets will be an entire population, but this cannot be assumed. Individuals may be missing from a dataset and they will be hard to detect. It is likely that these missing individuals will be a special subset of the population. There may also be missing information on some measures within the datasets. This ‘missingness’ might be unimportant, however the narrowness of the range of other explanatory variables in the dataset will greatly restrict the possibility of formal statistical methods for missing data being applied.

The ADRN will take on much of the burden for negotiating and securing access to data, however some sources of data may still remain inaccessible. Gaining access to specific datasets held by some organisations can be prohibitively slow and therefore disadvantageous for shorter projects. Many datasets will only be available to be analysed in approved ‘safe-settings’ and this places an extra burden on the data analyst. Because of the sensitivity of the data some data providers will place controls on the output that sociologists produce, and typically extra time will be required for researchers to get results cleared for presentations and publications.

Training in the standard range of statistically informed multivariate data analysis techniques that are required for analyses of large-scale social surveys are also required for analysing administrative datasets. We contend that given the scope and the restrictions of administrative data it is likely that some of the most fruitful sociological enterprises will involve analyses of administrative data that have been linked to existing well designed large-scale social science studies. The ADRN is ground breaking and offers support for intellectually exciting opportunities for sociological research using administrative data.

Don’t swipe left on administrative data.

Quantitative Methods Pedagogy in Sociology: What have we got to go on?

wordle 3Kevin Ralston, University of Edinburgh, 2015

A systematic review of the social science research methods pedagogy literature, taken over a decade, turned up 195 papers (Wagner et al., 2011). The review concluded that there is no pedagogical culture underlying the teaching of research methods in social science. Given that this conclusion is applied to social science as a whole, taking one of the disciplines and then focussing on just quantitative methods (QM) pedagogy is unlikely to result in a different conclusion. Indeed I have similarly surveyed the sociology QM pedagogy literature, taking a 25 year period, and cannot disagree with the finding of Wagner et al. (2011).

Over parallel timescales there have been several pushes to increase the level and capacity of QM teaching in UK social science – to name some: there is the Q-Step programme (http://www.nuffieldfoundation.org/q-step); the evolution of the ESRC Doctoral Training Centres; the AQMeN initiative to build quantitative methods capacity (https://www.aqmen.ac.uk/); the British Academy’s ‘Count us in’ report and suggested approach to a data skills strategy (www.britishacademy.ac.uk/countusin). The intentions here a laudable. Concepts such as methodological pluralism are cited (Payne et al., 2004) and any discipline which does not have all the available tools at its disposal is, in principle, weaker for it. In a note of optimism Kilburn et al. (2014) suggest that the policy foregrounding of the importance in methods in social science, together with the ‘capacity building’ policy rhetoric, may lead to an increased emphasis on pedagogy. What it means, however, to have a policy to build methodological capacity, in subject areas where there are little meaningful, evidenced, pedagogical approaches to teaching practice, is an open question.

Despite its paucity, there are the grass roots of literatures in sociology which could potentially grow into dynamic and fertile pedagogical fields. One of the themes in the literature advocates or describes general approaches to QM teaching. These are often articles written from experience (someone who has taught the subject area for many years) or suggesting how things are done within an institution (this is how we/I do it here) (e.g. Atkinson et al., 2006; Auster, 2000; Fischer, 1996; Lindner, 2012; Strangfeld, 2013). This is a useful resource. We cannot simply discount the views of a professor who has spent a career teaching in the field. Similarly, a description of how an institution programs QM teaching may very well give an example of best practice. Although it is useful for the field to build this information up it is not a substitute for an empirically informed pedagogy. There is an element of received wisdom associated with this literature.

The empirical literature researching QM pedagogy in sociology is limited but begins to give us something to go on. Typically these studies cover a year in a course or an institution (Delucchi, 2014; Maltby, 2001; Murtonen et al., 2008; Murtonen and Lehtinen, 2003; Wills and Atkinson, 2007). There is also a stream in the published work which assess an intervention in a course where, perhaps, a new mode of teaching or gimmick is tried (e.g. Delucchi, 2007; Schumm et al., 2002). The studies are often not too rigorous and may be based on small class sizes, n <100 (e.g. Maltby, 2001; Murtonen and Lehtinen, 2003; Pfeffer and Rogalin, 2012). Although, there are instances of larger studies where research has been done at institutional or departmental level, or over several years of teaching (Wilder, 2009; Williams et al., 2008). Often the evaluation is based upon student feedback (e.g. Auster, 2000) rather than building an experimental design to test the teaching practice (e.g. Raymondo and Garrett, 1998; Smith, 2003). It would be unfair to be over critical of this small literature. The work has limitations, however these studies are the only attempts to move towards generating an evidence base for our QM teaching practice. Despite weaknesses, this work is, in a sense, pioneering, and represents time taken to engage in an unfashionable but important field. This research provides a foundational contribution to the subject area.

Debate over the concept of student anxiety about statistics within social science encapsulates the issue of incomplete evidence conflating with received wisdom. The belief that social science undergraduates are apprehensive about their studies related to maths, statistics, and quantitative methods in general, is often cited in literature (e.g. Bridges et al., 1998; Paxton, 2006; Schacht and Stewart, 1990). The concept of maths anxiety is not exclusive to social science, it is also seen as a problem within specifically numerate subjects too (Henrich and Lee, 2011).  DeCesare (2007) suggests the concept to be overstated, showing that 40% of sociology students in a single institution, who responded to a survey in the U.S., report no angst. Williams et al. (2008) similarly found only a slight majority to report angst in a sample of students in England and Wales. Some work has been undertaken comparing anxiety between academic fields. In limited study comparing social science, health science, arts and hard science students Hamza and Helal (2013) found no significant differences in the mean level of maths anxiety. However, a comprehensive comparison of the level of anxiousness, both, between social science disciplines, and between social science and more numerate subjects, has yet to be undertaken. In the absence of a study of this nature, we do not know whether social science students are significantly more ‘frightened’ of numbers than their more numerate counterparts or whether some social sciences are faring better than others in this respect.

Moving the discussion on statistical anxiety amongst social science students towards an evidential basis is necessary if we are to address it. If it is the case that there is little difference in levels of anxiety between subjects then perhaps it is time to move beyond the anxiety definition which could itself be unhelpful in acting as a kind of self-fulfilling prophesy (DeCesare 2007). By the same token, if there really is an excessive angst in certain social sciences then we need to employ clear targeted strategies to address this, strategies based on evidenced pedagogy. At the moment, as a set of disciplines, it is uncertain whether social science is a special case of math anxiety, or, what can be implemented which is shown to reduce the level of worry associated with the study of maths in relation to social science.

Until we have an evidenced understanding of what constitutes good practice in QM teaching how do we know whether we are really improving capacity? Engendering an evidenced QM pedagogical practice for social science is a large proposition. It requires a generation of researchers interested and incentivised which would bring the production of research and literature to a critical mass. This could then be sustained by the availability of funding into developing QM teaching. Unless we achieve a reflective, verifiable, pedagogical culture, it is questionable how successful the aim of capacity building can be.

References

Atkinson, M.P., Czaja, R.F., Brewster, Z.B., 2006. Integrating Sociological Research Into Large Introductory Courses: Learning Content and Increasing Quantitative Literacy. Teach. Sociol. 34, 54–64. doi:10.1177/0092055×0603400105
Auster, C.J., 2000. Probability Sampling and Inferential Statistics: An Interactive Exercise Using M&M’s. Teach. Sociol. 28, 379–385. doi:10.2307/1318587
Bridges, G.S., Pershing, J.L., Gillmore, G.M., Bates, K.A., 1998. Teaching quantitative research methods: A quasi-experimental analysis. Teach. Sociol. 26, 14–28. doi:10.2307/1318676
DeCesare, M., 2007. “Statistics anxiety” among sociology majors: A first diagnosis and some treatment options. Teach. Sociol. 35, 360–367.
Delucchi, M., 2014. Measuring Student Learning in Social Statistics: A Pretest-Posttest Study of Knowledge Gain. Teach. Sociol. 42, 231–239. doi:10.1177/0092055×14527909
Delucchi, M., 2007. Assessing the impact of group projects on examination performance in social statistics. Teach. High. Educ. 12, 447–460. doi:10.1080/13562510701415383
Fischer, H.W.I., 1996. Teaching Statistics from the User’s Perspective. Teach. Sociol. 24, 225–230. doi:10.2307/1318815
Hamza, E., Helal, A.M., 2013. Maths Anxiety in College Students across Majors: A Cross-Cultural Study. Educationalfutures 5.
Henrich, A., Lee, K., 2011. Reducing Math Anxiety: Findings from Incorporating Service Learning into a Quantitative Reasoning Course at Seattle University. Numeracy 4. doi:http://dx.doi.org/10.5038/1936-4660.4.2.9
Kilburn, D., Nind, M., Wiles, R., 2014. Learning as Researchers and Teachers: The Development of a Pedagogical Culture for Social Science Research Methods? Br. J. Educ. Stud. 62, 191–207. doi:10.1080/00071005.2014.918576
Lindner, A.M., 2012. Teaching Quantitative Literacy through a Regression Analysis of Exam Performance. Teach. Sociol. 40, 50–59. doi:10.1177/0092055×11430401
Maltby, J., 2001. Learning statistics by computer software is cheating. J. Comput. Assist. Learn. 17, 329–330. doi:10.1046/j.0266-4909.2001.00188.x
Murtonen, M., Lehtinen, E., 2003. Difficulties experienced by education and sociology students in quantitative methods courses. Stud. High. Educ. 28, 171–185. doi:10.1080/0307507032000058064
Murtonen, M., Olkinuora, E., Tynjala, P., Lehtinen, E., 2008. “Do I need research skills in working life?”: University students’ motivation and difficulties in quantitative methods courses. High. Educ. 56, 599–612. doi:10.1007/s10734-008-9113-9
Paxton, P., 2006. Dollars and sense: Convincing students that they can learn and want to learn statistics. Teach. Sociol. 34, 65–70.
Payne, G., Williams, M., Chamberlain, S., 2004. Methodological Pluralism in British Sociology. Sociology 38, 153–163. doi:10.1177/0038038504039372
Pfeffer, C.A., Rogalin, C.L., 2012. Three Strategies for Teaching Research Methods A Case Study. Teach. Sociol. 40, 368–376. doi:10.1177/0092055X12446783
Raymondo, J.C., Garrett, J.R., 1998. Assessing the introduction of a computer laboratory experience into a behavioral science statistics course. Teach. Sociol. 26, 29–37. doi:10.2307/1318677
Schacht, S., Stewart, B.J., 1990. What’s Funny about Statistics? A Technique for Reducing Student Anxiety. Teach. Sociol. 18, 52–56. doi:10.2307/1318231
Schumm, W.R., Webb, F.J., Castelo, C.S., Akagi, C.G., Jensen, E.J., Ditto, R.M., Spencer-Carver, E., Brown, B.F., 2002. Enhancing learning in statistics classes through the use of concrete historical examples: The space shuttle challenger, Pearl Harbor, and the RMS titanic. Teach. Sociol. 30, 361–375. doi:10.2307/3211484
Smith, B., 2003. Using and Evaluating Resampling Simulations in SPSS and Excel. Teach. Sociol. 31, 276–287. doi:10.2307/3211325
Strangfeld, J.A., 2013. Promoting Active Learning: Student-Led Data Gathering in Undergraduate Statistics. Teach. Sociol. 41, 199–206. doi:10.1177/0092055×12472492
Wagner, C., Garner, M., Kawulich, B., 2011. The state of the art of teaching research methods in the social sciences: towards a pedagogical culture. Stud. High. Educ. 36, 75–88. doi:10.1080/03075070903452594
Wilder, E.I., 2009. Responding to the Quantitative Literacy Gap Among Students in Sociology Courses. Teach. Sociol. 37, 151–170. doi:10.1177/0092055×0903700203
Williams, M., Payne, G., Hodgkinson, L., Poade, D., 2008. Does British sociology count? Sociology students’ attitudes toward quantitative methods. Sociol.- J. Br. Sociol. Assoc. 42, 1003–1021. doi:10.1177/0038038508094576
Wills, J.B., Atkinson, M.P., 2007. Table Reading Skills as Quantitative Literacy. Teach. Sociol. 35, 255–263. doi:10.1177/0092055×0703500304