QUESTIONS TO ASK BEFORE USING A QUANTITATIVE DATA

Louise BEAUMAIS, 04/10/2022

Quantitative data are not objective: they are a social construct

In the last three decades, a broad corpus of literature – from political science, but also economics, law, anthropology, or sociology – has extensively questioned the use of quantitative data within Western contemporary societies. Various authors show how quantitative data are thought to embody “true knowledge”. Yet, this is an illusion: data do not just exist, they have to be “generated”. To be “generated” suggests two things. First, that a definition process is necessary to determine what should be measured and how it should be measured. Second, only that which is considered quantifiable is quantified, and its corollary, only that which is considered useful for the analysis at a given time t is quantified. Williams perfectly summarizes this: “a data set is already interpreted by the fact that it is a set: some elements are privileged by inclusion, while others are denied relevance through exclusion” (in Gitelman 2013).
Thus, there is no such thing as a “raw data”. They are always the result of a series of choices. No matter their form (statistics, graphs, measurements, indices, rankings) we think of numbers as objective only because they go through a rationalization process which makes them look objective. Thus, when coming across quantitative data, practitioners should consider them as constructed social facts – and these social facts need to be questioned.
In order to alleviate potential biases related to the construction of the data, this sheet presents several questions practitioners can ask themselves to have a more reasoned use.

COMING ACROSS THE QUANTITATIVE DATA

Greenhill (2010) provided the following questions that should be routinely asked about the data itself:
What are the sources of the numbers?
What definitions are the sources employing, and what exactly is being measured? 
What are the interests of those providing the numbers?
What do these actors stand to gain or lose if the statistics in question are – or are not – embraced and accepted?
What methodologies were employed in acquiring the numbers?
Do potentially competing figures exist, and, if so, what is known about their sources, measurement and methodologies? 
These elements allow you to make a first assessment of the quality of the data. Most often, practitioners consider that they don’t have the time nor the expected technical skills to do this assessment, but it is however essential to have a more reflexive use. Cognitive biases (see sheet n°2) will lead practitioners to choose quantitative data that fit their prior beliefs, prejudices, and stereotypes. Not only will they have a (false) feeling of understanding it quicker, but the confirmation of their own ideas can prevent them from having this critical examination – which makes the assessment even more useful. Obviously, technical skills may be considered as a barrier at the beginning but with perseverance, this process will become easier. If you don’t have the time to do this assessment, you can either refer to our Conflict Database Compass (CDC) or consider alternatives to quantitative data in your analysis.
Practitioners may find out that the actors providing the numbers have particular interests, and yet still have or want to use them. There is no definite answer in the definition of a “good” particular interest, considering how subjective this is. This will depend on how practitioners believe these particular interests overlap with their own or with their institutions’. However, in this case, it remains important to mention these interests in the analysis, by detailing the sources of funding, the definitions used and the potential biases related to the visual reproduction (choice of colors, indicators,…). It may even add value to the analysis.
If you can’t find information about the sources, definitions used, or methodologies, it may be best to do without quantitative data in your analysis.

USING THE DATA IN YOUR ANALYSIS 

When using data in your analysis, you must know what you expect from it. It can be knowing what you want to do with data, what you want to prove, what you want to visualize and for what purpose. The following questions may help you be more aware of your expectations.
N.B.: If this analysis is commissioned by your superiors or part of a specific project requiring the use of quantitative data, you can still ask yourself the following questions to bring out the most of your quantitative data.
Is it necessary to use quantitative data – or, in other words, what will be the added value of the quantitative data in your analysis?
Do you want to prove a point with it? If yes, is the data gathered sufficient to draw conclusions?
Do you want to evaluate a trend, a pattern? If yes, is the data gathered sufficient to draw conclusions?
Do you want to illustrate a qualitative argument?
Are you more interested in the visual aspect of quantitative data than its analytical value? If yes, how can you make sure not to bias the reproduction?
Are you sure the quantitative data you chose is valid to demonstrate your point?
For instance, military expenditures are often used as a proxy to gauge adversary’s intentions, when in fact they do not mean much per se. They have to be put into perspective with other quantitative variables (for instance demography, share of the GDP actually allocated to defence, active militaries and reservists,…), through time, and embedded in a qualitative analysis.

A WORD ON QUANTITATIVE DATA AND CONFLICT 

Conflict related quantitative data is often far from meeting the rigorous expectations of “good quality” data – meaning data that are considered accurate, complete, reliable, relevant, and timely. This is so for several reasons, all related to the construction of the data:
Conflict-related concepts, such as peace, civil war, violence… do not benefit from an established definition and are still highly debated within the IR community (let alone their measurement!). This can be seen in the way “conflict” has been defined by the different databases gathered in the compass.
The collection of data in conflict-related contexts is extremely complex because of the dangerousness of conflict zones and the lack of accessibility of certain areas. This clearly impedes their quality and reliability. It can also compromise their representativeness. For instance, some authors have even argued that scientific sampling (meaning a representative population sample) is usually not possible in conflict situation.
Some actors collecting or using the data in conflict-related contexts may have incentives to lie about it (adapting them to their particular interests, for instance by increasing them to alert authorities or citizens or by decreasing them to deny a potential issue). This is even more prominent because of the uncertainty surrounding the data collection.
Data on casualties should always be taken with caution (paying attention to the source of the data, to which casualties are included in the definition, who may be considered as “child”, as “injured”, as “non-combattant”, as “civilian”, what do the different categories suggested include, but also the data “freshness”…).

Some references on quantitative data construction and usages

Andreas, P., & Greenhill, K. M. (Eds.). (2011). Sex, drugs, and body counts: The politics of numbers in global crime and conflict. Cornell University Press.
Baele, S. J., Balzacq, T., & Bourbeau, P. (2017). Numbers in global security governance. European Journal of International Security, 3:1, 22-44.
Baele, S. J., Coan, T. G., & Sterck, O. C. (2018). Security through numbers? Experimentally assessing the impact of numerical arguments in security communication. The British Journal of Politics and International Relations, 20(2), 459–476.
Biruk, C. (2018). Cooking Data: culture and politics in an African research world. Duke University Press.
Dieckhoff, M., Martin, B. & Tenenbaum, C. (2016). Chapitre 13 – Classer, ordonner, quantifier. Dans : Guillaume Devin éd., Méthodes de recherche en relations internationales (pp. 247-266). Paris: Presses de Sciences Po.
Denis, J. (2018). Le travail invisible des données : éléments pour une sociologie des infrastructures scripturales. Paris : Presses des Mines.
Denis, J., & Goëta, S. (2013, February). La fabrique des données brutes. Le travail en coulisses de l’open data. In Penser l’écosystème des données. Les enjeux scientifiques et politiques des données numériques.
Fast, L. (2017). Diverging data: Exploring the epistemologies of data collection and use among those working on and in conflict. International Peacekeeping, 24:5, 706-732.
Gitelman, L (2013). Raw data is an oxymoron. MIT press.
Hansen, H.K. and Porter, T. (2012). What Do Numbers Do in Transnational Governance? International Political Sociology, 6:4, 409–426.
Martin, B. (2015). Chapitre 6. Les quantifications dans l’expertise des organisations internationales. Le cas de l’UNODC. Dans : Asmara Klein éd., Les bonnes pratiques des organisations internationales (pp. 131-150). Paris: Presses de Sciences Po.
Merry, S. E. (2011). Measuring the world: Indicators, human rights, and global governance. Current anthropology, 52(S3), S83-S95.
Rosga, A., & Satterthwaite, M. L. (2009). The trust in indicators: measuring human rights. Berkeley J. Int’l Law, 27, 253.
Rottenburg, R., Merry, S. E., Park, S. J., & Mugler, J. (Eds.). (2015). The world of indicators: The making of governmental knowledge through quantification. Cambridge University Press.
de Siqueira, I. R., Leite, C. C., & Beerli, M. J. (2017). Powered and disempowered by numbers: data issues in global governance. Global Governance, 27-30.