Category Archives: Research Article

Mortality by occupation: Is occupation no more than a convenient category?

Kevin Ralston, York St John University 2018

Like me, sociologists I have worked with tend to place occupation as of central importance in their examinations of the social world. This underscores a belief in the prominence of occupation as an indicator (and often determinant) of outcomes in people’s lives. This belief is not necessarily shared by those from other disciplines.

I was fortunate to be involved in a recently published work which estimated mortality in the UK by occupational group[1]. The research was led by Dr Srinivasa Vittal Katikireddi. The field for which the analysis was undertaken was public health.

A response, published in the Lancet, to our article, asked the question ‘why choose occupation as the category for analysis? Why not, for example, analyse according to main hobby, or main place of shopping? The answer is partly because occupational data are available’ (Jessop 2017). The piece argued that categorising people by their main job is ambiguous and that other classifications may produce more useful insights, suggesting alternative measures based on hobbies or shopping location may be preferable.

It is certainly possible to hypothesise causal pathways between shopping habits or hobbies and mortality. If we knew the average saturated fat content of the weekly shop we could predict an increased likelihood of a number of diseases and begin to think about specific public health interventions to influence levels of fat consumption. Similarly, whether people regularly participate in fun habits that involve groups and/or physical activity correlates with mental wellbeing and physical health. Knowledge of factors that stimulate involvement in sports or social networks can be used to improve health outcomes.

That being said, it is unlikely that general measures of hobbies or place of shopping would tell us more than if we know an individual’s occupation. A paper by Connelly et al (2016) describe occupation as the ‘most powerful single indicator of levels of material reward, social standing and life chances’. Indeed, occupation is likely to be a reasonable proxy of hobby types and is associated with shopping habits. What is more, people’s hobbies and shopping habits are outcomes influenced by occupational position. We know that social class background idicates whether people shop at Waitrose, play violin or are a member of a golf club. On the other hand it is difficult to imagine a realistic scenario where shopping at Sainsbury’s, being a keen angler or involved in a book club could have systematic influence on whether people are employed as teachers, carers or medical doctors.

The ongoing importance of public health analyses based upon occupation could be defended on a number of bases. Occupational analyses have a grand, long-run and robust theoretical underpinning. This is something categories such as hobby or favoured supermarket do not offer. This blog will not take the direction of constructing an argument in favour of occupation based on theory. Instead it will make a short general empirical justification in support of the use of occupation in public health analyses. I am (May 2018) working on a follow up paper to our research examining mortality by occupation. I thought I’d take a break from this to present a small piece of analysis which demonstrates something of the strength of association between an occupationally based measure and mortality.

Data

The data are from the ONS Longitudinal Study (LS) which contains linked census and life events for a 1% sample of the population of England and Wales. The LS has linked records at each census since the 1971 Census, for people born on one of four selected dates in a calendar year. These four dates were used to update the sample at the 1981, 1991, 2001 and 2011 Censuses. Life events data are also linked for LS members, including births to sample mothers, deaths and cancer registrations. New LS members enter the study through birth and immigration (if they are born on one of the four selected birth dates). From these data we have taken a sample of those present at the 2001 Census only. Death of a sample member is linked from administrative records. The outcome variable is age standardised all-cause mortality rate (per 100,000 person- years). The sample are men aged 20-59 years. Additional information on the sample can be found in the paper.

Occupation was self-reported in the 2001 census, in response to the question “What is the full title of your main job?”. Responses to this question were used to derive Standard Occupational Classification (SOC) 2000 codes that are readily available in the data. The follow up period for death was until 2011. Because of disclosure control issues we used SOC at three digit ‘minor’ level. There are 81 occupational groups coded at this level, we were able to report on 59 of these. From this we calculated European age standardised mortality rates and 95% confidence intervals by occupational group. The three digit SOC codes were used to apply a CAMSIS score to the occupational group. CAMSIS is an occupationally based measure of social stratification in the form of a scale of social distance and occupational advantage. More advantaged occupations score more highly on the scale, which ranges from 0 to 100 and is designed to have a mean of 50 for the general population (if you have not heard of or used CAMSIS before I suggest you check it out HERE, I highly recommend the measure).

Results

Figure 1
CAMSIS_MORTALITY_20180529

Figure 1 describes the relationship between CAMSIS and the mortality rate. The first graph shows estimated mortality with confidence intervals for the occupational group. The second shows only the point estimates for the occupational group, a linear fit line and a quadratic curve of the association with mortality. A strong correlation is evident between CAMSIS and the mortality rate (-.79). Although there is a deal of overlap in confidence intervals for many occupations, the pattern of association is clear, more advantaged occupations tend to have lower estimated mortality.

Conclusion

Correlation is not causation. Occupation in conjunction with all-cause mortality is limited in terms of its utility in ‘explaining’ the gradient in mortality observed. The estimated differences will be due to a range of factors, many of which are not directly applicable to the occupation, but which may be materially associated. That being said, it is certainly possible to identify direct testable hypotheses based on occupation. For example, recent work has shown that it is likely that firefighters experience increased rates of cancer because of contaminated equipment. This built upon more general work noting a higher incidence of cancer amongst firefighters. Questions I often wonder about, but have not had time to take further include: what is the risk of serious respiratory disease to delivery drivers who work in large cities versus those in rural areas? Are those in the new gig economy disproportionately affected?

These are points similar to those made by Jessop in commenting on our article. Nevertheless, it is necessary to firmly rebut the idea that we study occupation simply because it is what is available (whilst measures of hobby or favoured grocery shop are not). The small piece of analysis here demonstrates something of the magnitude of the association between an occupationally based measure and a measure of mortality. This is in line with Connelly et al.’s (2016) description of occupation as the ‘most powerful single indicator of levels of material reward, social standing and life chances’. There has been a long history of interdisciplinary overlap between sociology and public health. There is great potential for research drawing sociologically upon occupation as a basis for analyses of public health outcomes. Far from being a category that should be replaced, I would suggest occupation remains under exploited in public health research.

Acknowledgments

This study received no specific funding. SVK is funded by a NHS Research Scotland Senior Clinical Fellowship (SCAF/15/02). SVK and AHL are funded by the Medical Research Council (MC_UU_12017/ 13 & MC_UU_12017/15) and Scottish Government Chief Scientist Office (SPHSU13 & SPHSU15). DS is funded by the Wellcome Trust Investigator Award (100709/Z/12/Z) and the European Research Council (HRES-313590).

The permission of the Office for National Statistics (ONS) to use the Longitudinal Study is gratefully acknowledged, as is the help provided by staff of the Centre for Longitudinal Study Information and User Support (CeLSIUS). CeLSIUS is supported by the ESRC Census of Population Programme (award reference ES/K000365/1). The authors alone are responsible for the interpretation of the data.

Statistical data from ONS is Crown Copyright. Use of the ONS statistical data in this work does not imply the endorsement of the ONS in relation to the interpretation or analysis of the statistical data. This work uses research datasets that might not exactly reproduce ONS aggregates.

[1] The paper was also co-authored by Prof Alastair H Leyland, Prof Martin McKee and Prof David Stuckler

 

Research with our trousers down: Publishing sociological research as a Jupyter Notebook

Roxanne Connelly, University of Warwick

Vernon Gayle, University of Edinburgh

On the 29th of May 2017 the University of Edinburgh hosted the ‘Social Science Gold Rush Jupyter Hackathon’. This event brought together social scientists and computer scientists with the aim of developing our research and data handling practices to promote increased transparency and reproducibility in our work. At this event we contemplated whether it might ever be possible to publish a complete piece of sociological work, in a mainstream sociology journal, in the form of a Jupyter Notebook. This November, 6 months after our initial idea, we are pleased to report that the paper ‘An investigation of social class inequalities in general cognitive ability in two British birth cohorts’ was accepted in the British Journal of Sociology accompanied by a Jupyter Notebook which documents the entire research process.

Jupyter Notebooks allow anyone to interactively reproduce a piece of research. Jupyter Notebooks are already effectively used in ‘big science’, for example the Nobel Prize winning LIGO project makes their research available as Jupyter Notebooks. Providing statistical code (e.g. Stata or R code) with journal outputs would be a major step forward in sociological research practice. Jupyter Notebooks take this a step further by providing a fully interactive environment. Once a researcher has downloaded the requested data from the UK Data Archive, they can rerun all of our analyses on their own machine. Jupyter Notebooks encourage the researcher to engage in literate programming by clearly documenting the research process for humans and not just computers, which greatly facilitates the future use of this code by other researchers.

When presenting the results of social science data analyses in standard journal articles we are painfully confined by word limits, and are unable to describe all of the steps we have taken in preparing and analysing complex datasets. There are hundreds of research decisions undertaken in the process of analysing a piece of existing data, particularly when using complex longitudinal datasets. We make decisions on which variables to use, how to code and operationalise then, which cases to include in an analysis, how to deal with missing data, and how to estimate models. However only a brief overview of the research process and how analyses have been conducted can be presented in a final journal article.

There is currently a replication crisis in the social sciences where researchers are unable to reproduce the results of previous studies, one reason for this is that social scientists generally do not prepare and share detailed audit trails of their work which would make all of the details of their research available to others. Currently researchers tend to place little emphasis on undertaking their research in a manner that would allow other researchers to repeat it, and approaches to sharing details of the research process are ad hoc (e.g. on personal websites) and rarely used. This is particularly frustrating for users of infrastructural data resources (e.g. the UK’s large scale longitudinal datasets provided by the UK Data Service), as these data can be downloaded and used by any bone fide researcher. Therefore it should be straightforward, and common place for us to duplicate and replicate research using these data, but sadly it is not. We see the possibility of a future of social science research where we can access full information about a piece of research, and duplicate or replicate the research to ultimately develop research more efficiently and effectively to the benefit of knowledge and society.

The replication crisis is also accompanied by concerns of scientific malpractice. It is our observation that P-hacking is a common feature of social science research in the UK, this is not a statistical problem but a problem of scientific conduct. Human error is also a possible source of inaccuracy in our research outputs, as much quantitative sociological research is carried out by single researchers in isolation. Whilst co-authors may carefully examine outputs produced by colleagues and students, it is still relatively rare to request to examine the code. In developing our Jupyter Notebook we have borrowed two techniques from software development, ‘pair programming’ and ‘code peer review’. Each of us repeated the research process independently using a different computer and software set-up. This was a laborious process, but labour well spent in order to develop robust social science research. This process made apparent several problems which would otherwise be overlooked. At one point we were repeating our analysis whilst sharing the results over Skype, and frustratingly models estimated in Edinburgh contained 7 fewer cases than models estimated in Coventry. After many hours of investigation we discovered that the use of different versions [1] of the same dataset, downloaded from the UK Data Archive, contained slightly different sample numbers.

We describe this work as ‘research with our trousers down’ [2], as publishing our full research process leaves us open to criticism. We have already faced detailed questions from reviewers which would not have occurred if they did not have access to the full research code. It is also possible that other researchers will find problems with our code, or question the decisions which have been made. But criticism is part of the scientific process, we should be placing ourselves in a position where our research can be tested and developed. British sociology lags behind several disciplines, such as Politics and Psychology, in the drive to improve transparency and reproducibility in our work. As far as we are aware there are no sociology journals which demand researchers to provide their code in order to publish their work. It is most likely only a top-down change from journals, funding bodies or data providers which would develop the practices within our discipline. Whilst British sociologists are not yet talking about the ‘reproducibility crisis’ with the same concern as psychologists and political scientists, we have no doubts that increased transparency will bring great benefits to our discipline.

[1] This problem is additionally frustrating as the UK Data Service do not currently have an obvious version control protocol, and do not routinely make open sufficient metadata for users to be able to identify precise versions of files and variables. We have therefore documented the date and time that datasets where downloaded and documented this in our Jupyter Notebook. Doubtlessly, the UK Data Service adopting a clear and consistent version control protocol would be of great benefit to the research community as it would accurately locate data within the audit trail.
[2] We thank our friend Professor Robin Samuel for this apposite term.

Generations of worklessness, a myth that won’t die

Kevin Ralston, York St John University, 2017

The idea that there are multiple generations of the same family who have never had a job has popular, political and international resonance. In politics, UK Minister, Chris Grayling, is on record as stating there are ‘four generations of families where no-one has ever had a job’.

This belief in ‘generations of worklessness’ is often accompanied by the idea that there is an associated culture of worklessness. For example, Esther McVey, when she was Minister of State for Employment, made reference to the widespread notion that there is a ‘something for nothing culture’ among some of those claiming benefits.

Politicians of the red variety have also expressed similar sentiments. In a speech, where he discussed levels of worklessness in the UK, former Labour Prime Minister, Tony Blair, claimed that, behind the statistics, there were some households which have three generations who have never worked.

Ideas associated with generations of worklessness also regularly appear in the traditional UK print media. In 2013 the Daily Mail[1] reported a story about an individual who was convicted of burning down his house, which resulted in deaths. They used his status as a benefits claimant in order to characterise living on welfare benefits as a ‘lifestyle choice’ for some. This point is irrelevant to the human tragedy described but it is useful in spreading the notion of a benefits culture.

Embed from Getty Images

These recent examples have been foreshadowed by long running historical and academic debate. A report for the Department of Work and Pensions suggested versions of ideas like generations or cultures of worklessness have been around for 120 years. Michael B. Katz argues that themes of these types have characterised U.S. welfare for 200 years.

In US politics the idea of the ‘welfare queen’ has been used to justify policy in a similar manner to the UK’s ‘benefits cheats’ stereotype and the general notion, that there is a section of undeserving poor who should receive punishment or correction, is a key aspect of neo-liberal politics.

Underclass theory provides a theoretical expression of the type of thinking present in the generations theses. Central to underclass theory is the idea that generations have been socialised into worklessness.  More widely the theory puts forward that problems of illegitimacy and crime negatively define sections of society (the underclass).

We have undertaken newly published research which has searched for three generations of worklessness. This applied data collected over time, to assess whether there is any truth in the sorts of claims made by people like Chris Grayling. This research was the first to use representative data (British Household Panel Survey) to directly test whether three generations of worklessness could be identified in the UK. We found no evidence to support the belief that there are large numbers of families who have several generations that have never worked.

Although ideas around generations of worklessnes are widely expressed and have a long running history, the evidence does not support the theory. Lindsey Macmillan, an economist from University College London, estimated the numbers of families, from within the same household, in which there are two generations who have never worked. This was found to be a fraction of a percent. Other research has found similar results. A small scale study, which also looked for three generations of worklessness within deprived areas, could not find any such families.

The idea that there are generations of workless, who live in a culture of worklessness, creates a picture that there are large numbers of people trained to expect ‘something for nothing’. Arguments made in support of this type of thinking tend to be self-serving and used to push an agenda ignoring the structural problems that lead to people being unemployed.

The available evidence is against the existence of generations of worklessness. There is an ethical imperative on those involved in journalism, or formulating policy, to, at least, have an awareness of this evidence. Those, in these fields, who maintain these ideas are, at best, ignoring available evidence and at worst, wilfully misrepresenting reality.

In the absence of supporting evidence it is time to end over a century of debate. We need to do away with the pathological idea that there are large numbers of people in receipt of welfare benefits because they come from families that are too lazy to work.

 

[1] I have included this link here in a footnote, as I do not wish to encourage people to visit the Daily Mail web site and contribute to their advertising revenue: http://www.dailymail.co.uk/news/article-2304804/Mick-Philpott-benefits-culture-David-Cameron-backs-George-Osborne-saying-arson-case-raises-questions-welfare-lifestyle-choice.html<accessed 30/01/17>

The Determinants of Charity Misconduct

Diarmuid McDonnell & Alasdair Rutherford, 2017

As Corrado “Junior” Soprano, plagiarising a Chinese curse of dubious provenance, puts it: may you live in interesting times. Charities in the UK have been the subject of intense media, political and public scrutiny in recent years, resulting in three parliamentary inquiries. Public confidence and trust in the sector has been questioned in light of various “scandals” including unethical fundraising practices (resulting in the establishment of a new fundraising regulator for England and Wales in 2016), high levels of chief executive pay, politically-motivated lobbying and advocacy work, and poor financial management. Using novel data supplied by the Office of the Scottish Charity Regulator (OSCR), my colleague Dr Alasdair Rutherford and I describe the nature and extent of alleged and actual misconduct by Scottish charities, and ask what organizational and financial factors are associated with this outcome?

Background

First, some background on what we mean when we say “charity”. The Scottish Charity Register is maintained by OSCR which was established in 2003 as an Executive Agency and took up its full powers when the Charities and Trustee Investment (Scotland) Act 2005 came into force in April 2006. In Scotland, a charity is defined (under statute) as an organization that is listed on the Register after demonstrating that it passes the charity test: it must have only charitable purposes; the organization must or intend to provide some form of public benefit; it must not allow its assets to be used for non-charitable purposes; it cannot be governed or directed by government ministers; and it cannot be a political party. One of OSCR’s main responsibilities is to identify and investigate apparent misconduct and protect charity assets. It operationalises this duty by opening an investigation (what they term an inquiry) into the actions of a charity suspected of misconduct and other misdemeanours.

Investigations are mainly initiated as a result of a public complaint but they can also be opened by a referral from a department in OSCR or another regulator. For example, one of the founders of the charity The Kiltwalk reported the organization to OSCR on the grounds that he has concerns over the amount of funds raised by the organization that are spent on meeting the needs of beneficiaries. OSCR can only deal with concerns that relate to charity law – such as damage to charitable assets or beneficiaries, misconduct or misrepresentation – though it can refer cases to other bodies such as when criminal activity is suspected. Finally, the outcome is recorded for each investigation. Outcomes are varied and often specific to each investigation but most can be related to three common categories: no action taken or necessary; advice given; and regulatory intervention.

Method

This study examines two dimensions of charity misconduct that deserve greater attention: regulatory investigation and subsequent action. Regulatory action can take the following two, broad forms: the provision of advice (e.g. recommending a charity improve its financial controls to counteract the threat of fraud or misappropriation) and the use of OSCR’s formal regulatory powers (e.g. reporting the charity to prosecutors or suspending trustees). This study overcomes many of the limitations outlined previously by utilising a novel administrative dataset, derived from OSCR, covering the complete population (current and historical) of registered Scottish charities. It is constructed from three sources: the Scottish Charity Register, which is the official, public record of all charities that have operated in Scotland; annual returns, which are used to populate many of the fields on the Register (e.g. annual gross income); and internal OSCR departmental data relating to misconduct investigations. Once linked using each observation’s Scottish Charity Number, this dataset contains 25,611 observations over the period 2006-2014.

The outcome of being investigated by the regulator is measured using a dichotomous variable that has the value 1 if a charity has been investigated and 0 if not. The other two dependent variables are also dichotomous: regulatory action takes the value 1 if a charity has had regulatory action taken against it and 0 if not; and intervention takes the value 1 if a charity is subject to regulatory intervention and 0 if not (i.e. it received advice instead). The dependent variables are modelled using binary logistic regression. We model the probability of investigation using binary logistic regression as a function of organization size, age, institutional form, field of operations and geographical base.  For the sub-sample of organizations that were investigated, we then model the probability of regulatory action, and its different forms, being taken based on the same characteristics plus the source of the complaint made.

Describing Investigations and Regulatory Action

There have been 2,109 regulatory investigations of 1,566 Scottish charities over the study period: this represents six percent of the total number of organizations active during this time. The number of investigations increased steadily during OSCR’s early years and then plateaued at around 400 per year until 2013/14, when the figure declined slightly. The majority of investigations (78 percent) concerned charities that were investigated only once in their history. A little over 30 percent of investigations resulted in regulatory action being taken against a charity: 16 percent received advice and 13 percent experienced intervention by OSCR.

It is a member of the public that is most likely to contact OSCR with a concern about a charity. Internal stakeholders of the charity account for 31 percent of all investigation initiators, though this disregards the strong possibility that many of those recorded as anonymous are involved in the running of the charity they have a concern about. The concerns that prompt these actors to raise a complaint with OSCR are numerous and diverse. Figure 1 below visualizes the associations between the most common types of complaint and the response of the regulator. The overriding concern is general governance, as well as associated issues such as the duties of trustees and adherence to the founding document. Financial misconduct also ranks highly, particularly the misappropriation of funds and suspicion of financial irregularity.

Figure 1. Association between type of complaint and regulator response

Figure1

Note: Each complaint can have two types, and maps to one of the regulatory responses. The fifteen most common complaint types are shown. The thickness of the line is proportional to the number of complaints leading to each regulatory response.

Modelling the Risk of Investigation and Action

In Table 3, we report the odds ratios (exponentiated coefficients) rather than the log odds as they approximate the relative risk of each outcome occurring. This is appropriate not only for ease of interpretation but because the absolute chance of either outcome occurring is low (i.e. it is better to know which charities are more likely relative to their peers). The category with the most observations is chosen as the base category for each nominal independent variable.

Table 1. Results of Logistic Regression on dependent variables

Tabel1

We first examine the effects of organization age and size on the outcomes. The coefficient for age varies across the three outcomes. A one-unit increase in the log of age results in a five percent decrease in the odds of being investigated or being subject to regulatory action; however, the odds of experiencing intervention compared to receiving advice are higher for older charities. There appears to be a clear income gradient present in the investigation model: as organization size increases so do the odds of being investigated compared to the reference category. With regards to the actor that initiates an investigation, it appears that stakeholders with a monitoring role (e.g. funders, auditors or other regulators) are more likely than members of the public to report concerns that warrant some form of regulatory action; in contrast, internal charity stakeholders such as employees and volunteers have higher odds of identifying concerns that merit the provision of advice by OSCR and lower odds of triggering regulatory intervention in their charity.  While size predicts complaints, it is the source of the complaint that is a more reliable predictor of the need for regulators to take action.

A more nuanced examination of the effect of organization size is possible by comparing categories of this variable to each other and not just the base category (shown in Figure 2). Drawing on suggestions by Firth (2003), Firth and Menezes (2004), and Gayle and Lambert (2007), we employ quasi-variance standard errors to ascertain whether categories of organization size are significantly different from each other. Unsurprisingly, the largest charities have significantly higher odds than all other categories; however it appears that the middle categories (charities with income between £100,000 and £1m) are not significantly different from each other and neither are organizations between £500,000 and £10m.

Figure 2. Quasi-Variance log odds of being investigated

Figure2

Conclusion

The results of the multivariate analysis point to the factors associated with charity investigation and misconduct, showing the mismatch between those predicting complaints and those predicting regulatory action. This has considerable implications for charity regulators seeking to deploy their limited resources effectively and in a way that ultimately protects and enhances public confidence. By revealing the disconnect between the level of complaints and concerns that require regulatory action, we argue there is much work to do for practitioners in the sector with regards to charity reputation and stakeholder communication. Charity boards are ultimately responsible for the governance of their organization, and must ensure that adequate policies and procedures are in place. This includes reducing the risk of misconduct occurring, taking corrective action in response to guidance from the regulator, and developing the management and reporting functions required to deal with the consequences. Recognition should also be given to the role that stakeholders such as funders and auditors must play in self-regulation of the sector, given their proximity to charities through their day-to-day activities. It is no longer sufficient (if indeed it ever was) to rely on charity status to convey trust and inspire confidence in the conduct of an organization.

Using Quantitative Methods to study Big Data skills: considering relevant proxies for ‘Big Data’ skills

Alana McGuire, University of Stirling, 2016

Background

This blog is based upon work being undertaken for a PhD at the University of Stirling which explores the impact of Big Data on skill requirements for employers in Scotland. A version of this article was presented as a poster at the National Centre for Research Methods Festival, Bath, July 2016. The project research design applies mixed methods using a hybrid adaption of the explanatory sequential design (Creswell and Clark, 2011). Questions the study will address include: How is Big Data changing skill demands for employers? Is data becoming a more central part of organisations, and if so, is this causing changes in the job roles of employees in the organisation? Are there discrepancies between the skills that employees are being equipped with on training courses and the skills that employers are seeking? Is there evidence of social/gender/ethnic inequalities in Big Data skills?          

The definition of Big Data is contested. The term ‘Big Data’ in the context of this project refers to complex data that requires a change in what is actually perceived as data (Lagoze, 2014). This may be structured in a conventional dataset or unstructured, for example, data from a health device or Twitter. This can also take a variety of formats. The size of the data itself is not the defining characteristic for the purposes of my research.

Mellody (2014: 10) argues that the main skills needed to work with Big Data are ‘computing and software engineering’, ‘machine learning’ and ‘optimization’. Machine learning focuses on ‘how to get computers to program themselves’ (Mitchell, 2006: 1). By optimization, Mellody is referencing ‘database optimization’, that is the programing of the database so that commands are executed and results obtained in the quickest way possible (Mullins, 2010).  As well as these skills, Yiu (2012) argues that Big Data specialists must also have ‘soft’ skills such as good communication, collaboration and creativity. Further to this, Yui suggests critical consumption of data and statistical methods are skills which have been neglected by the literature exploring the abilities needed to work with Big Data. Although needs for these skill may not be unique to Big Data, it is essential when working with Big Data that the analyst understands which methods are appropriate and how to interpret output from these.

Routinely collected and deposited data sources, such as the Labour Force Survey (ONS, 2016), do not capture variables which encompass the combination of skills discussed in the literature as necessary in the practice of analysis using Big Data.  A key issue is therefore to find proxies that can robustly measure Big Data skills. Given this dearth of resources a plausible alternative strategy may be available in the Employer Skills Survey (ESS). These data contain some information on skills shortages which can be used to assess need within sectors of the economy, for example, data on numeracy, IT, and communication skill shortages (see UK Commission for Employment and Skill, 2016). In addition to the ESS, the 1970 British Cohort Study (BCS) tested ability in maths, several of the items used in this test are related to those abilities considered definitive of Big Data skills.

This remainder of this post outlines two proxy measures that could be relevant to understanding the prevalence of skills associated in working with Big Data.

Data and Methods

The Employer Skills Survey is a large scale survey conducted annually by the UK Commission for Employment and Skills (2016). For the 2013 survey, 91,279 interviews were completed. This survey is one of the largest of its kind in the UK, providing a wealth of data surrounding skills shortages in the UK.

The Employer Skills Survey was used to define a variable that measures the basic skills that are needed in an industry in order to potentially make use of data in that industry. This was given the form of a score, constructed from several skill shortage variables, these included, communication, numeracy, and IT skills. Graph 1, below, shows the distribution of this variable. The skill score variable has a mean of 1.89 and a range of between zero and six, zero being indicative of no difficulty finding Big Data base skills and six being indicative of having difficulty finding every one of the big data base skills. Should an industry score highly on this variable this can be considered to indicate that the particular industry or organisation finds it difficult to recruit employees with skills identified as necessary for working with Big Data. It would be particularly problematic if the industries lacking these skills were ones which could benefit from the analysis of Big Data.

Graph 1

graph1_2

Applying this variable makes it is possible to make an assessment of industries, or sectors, which may be experiencing a shortfall in recruitment of the core skills. A simple analysis is presented using OLS regression controlling the organisation type, comparing non-market organisation and profit making companies, the size of the organization, being an SME (small/medium sized enterprise) or not, whether the organisation is based in Scotland, compared to the rest of the UK and an interaction term between being based in Scotland and being an SME. If there is a skills shortfall of this type the effort and expenditure required to upskill staff from a poor basis would be far greater.

Alongside the Employer Skill Survey analysis, I have undertaken some initial analysis using the British Cohort Study. This study takes a group of babies born in a week in 1970 and follows these individuals throughout their lives. There was a follow up study to this which gave an arithmetic test to a sample of the cohort at aged sixteen. Many of the questions in this test are highly relevant for understanding statistics and data distributions. Further to this, there are also datasets from later studies which contain socioeconomic information on the same individuals. One avenue for these data in my study is to consider the score on the arithmetic tests as proxy for Big Data skills. This presumes that statistical literacy is a key element of Big Data skills and at the moment it is unclear that this is the case. If we make an assumption that mathematical and statistical abilities are important aspects of working with big data, then the distribution of these skills in the population could relate to whether sectors of the economy are able to tap into these skills. National statistics, scocio-economic classification (NS-SEC) 7 class is used to estimate the level of these skills by social class. The NS-SEC social class measure was captured during follow up to the original study, done in 2004/05 (University of London, 2016b). At this point individuals would have reached an age of occupational maturity (around 34 years of age) (Goldthorpe, 1987).

Results and Discussion

Table 1 presents the results of an analysis of the ESS using OLS regression. The skill score variable described above is set as the dependent variable. As described above, dummy variables are included which control for the comparison between a non-market organisation and a profit making company; being an SME or not; based in Scotland, compared to the rest of the UK; and an interaction term between being based in Scotland and being an SME. The associations for non-market organisations and SMEs are statistically significant which suggests that organisations with these characteristics are more likely to have fewer employees with a skill base capable of working with Big Data. This resonates with findings reported by E-Skills UK (2013) which suggests that SMEs are far less likely to make use of Big Data. Being based in Scotland is not significant and neither is the interaction term, suggesting that organisations located in Scotland are not any more likely to lack the Big Data base skills than organisations located elsewhere in the UK. Testing whether is finding is consistent is an important focus for my wider PhD study.

Table 1, OLS regression results, the dependent variable is the Big Data base skill score

  Coefficient Standard error P-value
Non market .554 0.098 0.000
SME .233 0.082 0.005
Scotland .003 -0.35 0.989
SME*Scotland -.154 -0.64 0.532
Constant 1.659 1.53 0.000

 I used the BCS to examine whether there are any suggestions in the data of social, gender, and ethnic inequalities in the distribution of the maths test results. In order to do this, I looked at correlations between arithmetic scores from 1986 (University of London, 2016a) and later data from the module including measures of social class. Graph 2 shows the mean arithmetic scores with confidence intervals form 1986 with NS-SEC from 2004/05. In this graph, routine occupations is the lowest occupational social class in the NS-SEC in this data and higher managerial is the highest. A gradual decline of arithmetic scores in line with declining NS-SEC occupational social class is evident. This is indicative of a possible social divide in Big Data skills. Albeit this only holds if statistical skills are a good indicator of Big Data skills and more research on my part is necessary to find out if this is the case.

Graph 2

graph2

Conclusion

This post has proposed two proxy measures of Big Data skills using data from the Employer Skill Survey and the British Cohort Study. These proxies may be relevant for measuring the prevalence of Big Data skills in the general population and for assessing how social stratification relates to Big Data skills. Going forward, more research is needed to ensure that these measures are robust.

This work provides a starting point for me to examine social, gender, and ethnic inequalities in Big Data skills. Alongside my statistical analysis, I will be supplementing this with qualitative research in the form of interviews with skills providers, employers, and employees. My statistical measures will be revisited after interviews to examine if the measures that I have used thus far are valid proxy variables for Big Data skills. If this is not the case, I will collect additional primary data which can then be used in my analysis. I would be glad to receive any constructive feedback in respect of my study and to hear from anyone working on a related topic.

Acknowledgements: I would like to acknowledge the help of my project supervisors, Dr Alasdair Rutherford and Professor Paul Lambert, I would also like to thank Dr Roxanne Connelly for suggestions made for this paper, and the PhD is funded by the ESRC.

Blog: https://alanainprogress.wordpress.com/
Twitter: @_AlanaMcGuire
Email: alana.mcguire@stir.ac.uk

References

Creswell, J.W., and Clark, V.L. (2011). Designing and Conducting Mixed Methods Research. Sage: London

Tashakkori, A., and Creswell, J.W. (2007). The new era of mixed methods. Mixed Methods Research. 1: pp.3-7.

E-Skills UK. (2013) Big Data Analytics: Adoption and Employment Trends, 2012-2017. Accessed  online at <http://www.e-skills.com/Documents/Research/General/BigDataAnalytics_Re port_Nov2013.pdf>

Goldthorpe, J. H. (1987) Social Mobility and Class Structure in Modern Britain, 2nd edition. Oxford: Clarendon Press.

 Lagoze, C. (2014) Big Data, data integrity, and the fracturing of the control zone. Big Data & Society, pp.1-11.

 Mellody, M. (2014). Training Students to Extract Value from Big Data: Summary of a Workshop. National Research Council.

Mitchell, T. (2006) The Discipline of Machine Learning. Accessed on 09/12/15 at                 <http://www.cs.cmu.edu/~tom/pubs/MachineLearning.pdf>

Mullins, C. (2010) Defining Database Performance. Database Trends and Applications. Accessed on        9/12/15 at <http://www.dbta.com/Columns/DBA-Corner/Defining-Database-Performance-70236.aspx>

Office for National Statistics. Social Survey Division. (2016). Quarterly Labour Force Survey Household Dataset, January – March, 2016. [data collection]. UK Data Service. SN: 7991, http://dx.doi.org/10.5255/UKDA-SN-7991-1

UK Commission for Employment and Skills. (2016). Employer Skills Survey, 2013. [data collection]. 2nd Edition. UK Data Service. SN: 7484, doi: http://dx.doi.org/10.5255/UKDA-SN-7484-2.

University of London. Institute of Education. Centre for Longitudinal Studies. (2016a). 1970 British Cohort Study: Sixteen Year Follow-up, Arithmetic Test, 1986. [data collection]. 2nd    Edition. UK Data Service. SN: 6095, doi: http://dx.doi.org/10.5255/UKDA-SN-6095-2.

University of London. Institute of Education. Centre for Longitudinal Studies. (2016b). 1970. British Cohort Study: Thirty-Eight-Year Follow-Up, 2008-2009. [data collection]. 4th Edition. UK Data Service. SN: 6557, doi: http://dx.doi.org/10.5255/UKDA-SN-6557-3.

Yiu, C. (2012). The Big Data Opportunity. Policy Exchange. Accessed online at      <http://www.geomapix.com/pdf/big%20data.pdf&gt;

The concealed middle? An exploration of ordinary young people and school GCSE subject area attainment

Christopher J. Playford and Vernon Gayle, University of Edinburgh

School examination results were historically a private matter, and the awareness of results day was usually confined to pupils, teachers and parents. School exam results are now an annual newsworthy item in Britain and every summer the British media transmit live broadcasts of groups of young people receiving their grades. This recurrent event illustrates, and reinforces, the importance of school-level qualifications in Britain.

The General Certificate of Secondary Education (GCSE) is the standard qualification undertaken by pupils in England and Wales at the end of year 11 (age 15-16). School GCSE outcomes are worth of sociological examination because in the state education system they mark the first major branching point in a young person’s educations career and play a critical role in determining pathways in education and employment.

In our paper, we turned our attention to exploring school GCSE attainment at the subject-area level, rather than looking at overall outcomes or outcomes in individual GCSE subjects. This is an innovative approach to studying school GCSE outcomes. The initial theoretical motivation was to explore if there were substantively interesting combinations or patterns of GCSE outcomes, which might be masked when the focus is either overall outcomes or outcomes in individual subjects. Within the sociology of youth there has been a growing interest in the experiences of ordinary pupils who have outcomes somewhere between the obviously successful and unsuccessful levels, and this group have been referred to as the ‘missing middle’.

The data used in the paper are from the Youth Cohort Study of England and Wales (YCS) which is a major longitudinal study that began in the mid-1980s. It is a large-scale nationally representative survey funded by the government and is designed to monitor the behaviour of young people as they reach the minimum school leaving age and either remain in education or enter the labour market. School GCSE outcomes are challenging to analyse because there are many GCSEs available, there is an element of pupil choice in the diet of GCSE that a pupil undertakes, some pupils study more GCSEs than others, each GCSE subject is awarded an individual grade on an alphabetical scale (A* being the highest and G being the lowest), and subject GCSE outcomes are highly correlated. We employ a latent variable approach as a practicable methodological solution to address the messy and complex nature of school GCSE outcomes.

In the paper we identify substantively interesting subject-level patterns of school-level GCSE outcomes that would be concealed in analyses of overall measures, or analyses of outcomes within individual GCSE subjects (see Table 1). The modelling process uncovers four distinctive latent educational groups. The first latent group is characterised by good GCSE outcomes, and another latent group is characterised by poor GCSE outcomes. There are two further latent groups with ‘middle’ or ‘moderate’ GCSE outcomes. These two latent groups have similar levels of overall (or agglomerate) outcomes, but one group has better outcomes in science GCSEs and the other has better outcomes in arts GCSEs.

Table 1. Latent group model results (four group model) school GCSE subject area outcomes.

playford pictureNote: Youth Cohort Study of England and Wales, Cohort 6; All pupils gaining a GCSE passes at grades A–G; n = 14,281; Posterior probabilities and prior probabilities reported as percentages. Reproduced from Playford and Gayle 2016, Table 5 p.156.

Membership of the latent educational groups is highly stratified. Socially advantaged pupils are more likely to be assigned to group 1 ‘Good Grades’. In contrast, the pupils assigned to group 4 ‘Poor Grades’ are more likely to be from manual and routine socioeconomic backgrounds. The analyses uncovered two latent educational groups with similar levels of moderate overall school GCSE outcomes, but different overall patterns of subject level outcomes. A notable new finding is that pupils in latent educational group 2 ‘Science’, had a different gender profile to pupils in group 3 ‘Arts’, but both groups of pupils were from the same socioeconomic backgrounds.

Our paper is innovative because it documents a first attempt to explore patterns of school GCSE attainment at the subject area level in order to investigate whether there are distinct groups of pupils with ‘middle’ levels of attainment. The sociologist Phil Brown made the pithy statement that there is an invisible majority of ordinary young people who neither leave their names engraved on the school honours board nor gouged into the top of their desks. We conclude that such pupils are found in the two ‘middle’ latent educational groups. We see no obvious reasons why school exam results will not continue to be an annual newsworthy item and we suspect that the media focus is most likely to remain on pupils with exceptional outcomes rather than those with the more modest results that characterise the two ‘middle’ latent educational groups.

A new GCSE grading scheme is likely to be introduced from August 2017. A new set of grades ranging from 1 to 9 (with 9 being the highest) will replace the A*–G scheme. Early indications suggest that the older eight alphabetical grades (A*–G) will not map directly onto the new 1–9 grades, but there will be some general equivalence. Despite the potential reorganisation of GCSEs, and the proposed changes in the grading system, school level GCSEs will continue to be complicated and messy and the methodological approach used in this paper will be equally appealing for the analysis of more contemporaneous educational cohorts.

Playford, Christopher J., and Vernon Gayle. “The concealed middle? An exploration of ordinary young people and school GCSE subject area attainment.” Journal of Youth Studies 19.2 (2016): 149-168. DOI: 10.1080/13676261.2015.1052049

Council Housing and the Undeserving Poor

Kevin Ralston, University of Edinburgh

‘here we are paying them to have kids it beggars belief, people like this should be forced to live in poor houses like they did 100 years ago, I bet she would soon look for work then’
Scott, Norwich, online response to newspaper article

‘£1200 a month plus a free house…The benefits entitlement crowd is a waste of space the lure of an easy life on the state gives little encouragement to work’
db2712, London, online response to newspaper article

Last month (at the time of writing, August 2015) the Chancellor, Gideon ‘George’ Osborne commended an emergency budget to the House of Commons. Included in the Chancellors fiscal revision was the removal of housing benefit for young people aged between 18 and 21 who are out of work. Young parents with children are exempted, as are the vulnerable and those who had been working for a 6 month period prior to claiming (Summer Budget, 2015). To currently qualify for housing benefit an individual must be on a low income or claiming other benefits and have low savings.

The justifications that are made for the policy of general reductions in benefits pursued by this Government fall into two categories. 1. Deficit reduction and 2. Rebalancing to make benefits fairer by cutting excess monies being given to those who do not deserve it.

The idea that people are given houses and benefits they do not deserve is widely conveyed. In 2014 the Daily Mail published a story headlined: Jobless mum advises her daughter, 19, to get pregnant – for an easy life on benefits . This included the statement that the daughter ‘became pregnant six months ago, and is now in line for an extra £400 a month courtesy of taxpayers when her baby is born, as well as a two-bedroom council house’.

This style of popular reporting is ubiquitous and encourages readers to draw negative moral conclusions regarding both the behaviour of the individuals highlighted and people who occupy council house tenure in general. The notion of the undeserving poor has coloured the political narrative around how the less well off should be engaged with for hundreds of years (Katz, 2013, Robbins, 2002). This is not only a condemnation of the individuals concerned but of the system, which is portrayed as allowing, and more, positively encouraging people to abuse it. The quotes given above, at the introduction to this piece, suggest, that articles like this, from the Daily Mail, help generate, reflect and sustain a belief in the way in which council houses may be accessed, and in how a culture of state dependence may perpetuate. This kind of narrative propagates the idea that young people have children in order to gain access to council housing and also the perception that this happens, in large numbers, and as a matter of course. This is just one of the mechanisms through which socialised housing and supports have been under sustained attack in the UK since the 1980s (Malpass and Mullins, 2010).

There are several assumptions underlying the negative reports and views expressed around council house allocation and residence. There is the concept of benefits culture, that several generations have formed a dependence on state benefits (Macdonald et al., 2013). There is the assumption that the general circumstances and actions are undertaken by relatively large numbers of people ‘the benefits entitlement crowd’. There is the understanding that access to benefits and social housing is too easy to achieve and simply triggered by shifts in circumstance, like becoming a parent. This is also associated with the implication that young people who live in council house tenure with their parents will simply have a child and move into their own council house. Finally, it is assumed or implied that the motivations that people have for doing this are morally questionable, being borne of laziness or a wish to live at the cost of others.

Is there really a ‘benefits entitlement crowd’ accessing council housing in the way portrayed by the mainstream media? Recently I have been doing a bit of work using the British Household Panel Survey (BHPS) to quantify the numbers who actually move to council house tenure from different tenures and after having children.The BHPS is a longitudinal dataset which follows individuals over time. Members of the panel are interviewed at each wave, so that changes over time can be measured. The BHPS began in 1991 and ran until 2008, for 18 waves. The aim of the work is simply to assess the numbers of people to whom the Daily Mail characterisation may actually apply.

The sample I looked at included all young people documented as original sample members aged between 16 and 19 and who are recorded as children (own child, step-child or foster-child) to a household reference person (HRP – head of household) when first observed and who had not previously had children themselves. Anyone who becomes a panel member when they become 16 and is also a child of an original panel member is also included. The sample consists of 2271 cases. 941 of these are observed as switching from a child in the parental home to their own home.

Table.1, Tenure at move by whether it is preceded by a birth


                                        % No-birth    % Birth    n
Owned/mortgage         93.07              6.93          332
Council Housing           74.29              25.71        105
Housing Association    78.18              21.82        55
Private Renting             97.17              2.83          389
Other Renting               94.87              5.13          39
n=    920
Chi sq=0.00, Cramers V= 0.29
Source, BHPS waves 2-18

Of the 941 moves from the parental home to an independent household observed, 920 are to a known tenure. Of these only 75 (8%) are preceded by a birth of a child. 26% of the moves following a birth occurred to those who moved to a council house tenure, with the majority occurring to those who moved to other tenure statuses (see table 1). This is disproportionate as less than 12% of overall moves are to council house tenures and the chi-square and association suggests a relationship between a prior birth and the move to various tenures. Despite the descriptive relationship whereby a birth precedes a move to a local authority rented house, this represents only a very small proportion of the total sample. Those who move to council house tenures following the birth of a child account for less than 3% of all the moves from the parental home observed (27 moves).

Perhaps 3% constitutes a crowd. Even if all of this group were entirely dependent on benefits it is clearly only a tiny minority of young people to whom the Daily Mail characterisation would apply. More realistically, the media is generating, reflecting and maintaining a skewed perception about the numbers of young people who move to council housing, and their circumstances.

It is also questionable whether the Chancellors priority really is deficit reduction. Or whether someone with a BA in Modern History (2:1) is really best placed to understand the technicalities of an unfathomably complex international economy, but those are separate issues altogether.

References:

Katz, M., B., 2013. The Undeserving Poor: America’s Enduring Confrontation with Poverty, 2nd ed. ed. Oxford University Press, USA.
Macdonald, R., Shildrick, T., Furlong, A., 2013. In search of “intergenerational cultures of worklessness”: Hunting the Yeti and shooting zombies. Crit. Soc. Policy. doi:10.1177/0261018313501825
Malpass, P., Mullins, D., 2010. Local Authority Housing Stock Transfer in the UK: From Local Initiative to National Policy. Hous. Stud. 17.
Robbins, R., H., 2002. Global problems and the culture of capitalism, 2nd ed. ed. Allyn and Bacon, Boston.
Summer Budget, 2015. HM Treasury, London.