Colonialism and Its Legacies:
A Comprehensive Historical Dataset
A Project Funded by the National Science Foundation
(Proposals 0648292 and 0647921)
Principal investigators: |
John Gerring, Department of Political Science, Boston University |
James Mahoney, Department of Sociology and Political Science, Northwestern University |
|
Collaborators: |
Paul Barclay, Department of History, Lafayette College |
Neil Englehart, Department of Political Science, Bowling Green State University |
Andrew Harris, Department of Government, Harvard University |
Jonathan Krieckhaus, Department of Political Science, University of Missouri-Columbia |
Charles Kurzman, Department of Sociology, University of North Carolina |
Patrick Manning, Department of History, University of Pittsburgh |
Subhashish Ray, Department of Political Science, Rochester |
James Robinson, Department of Government, Harvard University |
Jennifer Rosen, Department of Sociology, Northwestern University |
Nicolas van de Walle, Department of Government, Cornell University |
Robert Woodberry, Department of Sociology, UT Austin |
It has become commonplace to observe that the colonial experience shaped the modern era in profound ways. Colonial policies and practices are widely blamed for the underdevelopment of the South, the absence of significant industrialization, ethnic strife, weak state capacity, authoritarian rule, weak national identity, diffuse and porous borders, hunger, illiteracy, and corruption. Interestingly, colonialism is also sometimes praised for furthering social, political, and economic development in the South. Indeed, it is a central issue of dispute in the scholarly community whether colonialism fostered, or delayed, the development of the regions that it touched.
Africa provides a stunning example of these directly contradictory arguments. Conventionally, Africa’s developmental prospects were thought to have been hindered by colonial interference (e.g., Young 1994). Yet, the striking fact is that Africa experienced considerably less colonial intervention than most parts of the world. This has led some writers to claim, at least implicitly, that Africa’s problems at the present time are attributable to insufficient colonial influence (Herbst 2000; Mamdani 1996). In short, while there is general agreement that “colonialism mattered,” it is less clear what the long-term effects of this traumatic intervention actually have been. Indeed, the virulence of scholarly and popular opinions about colonialism is matched only by the inconclusiveness of current research (contrast Alam 2000 and Grier 1999).
Given that colonialism is a complex subject, evoking strong feelings, it is perhaps not surprising that the extensive study devoted to this weathered subject across the fields of the social sciences has not rendered a clear verdict on its legacy. Contributing to the inconclusiveness of this research are certain persistent methodological problems associated with the two dominant strategies of research – the case study (area study) and the global study (crossnational study) – which we now briefly review.
Usually, work on colonialism follows a case study approach that involves the intensive study of a particular country or region of the world (e.g., Brown 2000; Young 1994), or a particular colonial power (e.g., Armitage 2000). Alternatively, an author or a group of authors may cover the globe but do so with a series of case study analyses (e.g., Chamberlain 1998; Kohli 2004). While there are advantages to this way of approaching the subject – and our own approach builds self-consciously on precisely this sort of work – case study work is not designed to estimate typical causal effects for large populations. Rather, the goal of this literature is to understand the causes and consequences of colonialism in delimited contexts, leaving questions about general causal effects for other types of research. Thus, while case studies have taught us a great deal about the effects of colonialism in particular places and at particular times, they have not told us whether colonialism overall had a positive or negative impact on the non-western world, or in what specific ways it affected that world as a whole. Given that many writers do presume general effects, not simply contextually specific effects, there is a prima facie case for a general (global) approach to the problem of colonialism. In any case, to learn whether colonialism left behind general effects cannot be determined by mere assertion; it requires comprehensive evidence in the form of a global dataset. The same may be said for the claim of historical specificity; “different effects in different places” is not a hypothesis that can be proven without systematic testing across all the cases (or a significant sub-sample thereof).
The opposite difficulty is posed by the (rather few) studies of colonialism that are truly global in scope – where a general hypothesis is tested across all countries (or all developing countries). The problem here is that the subject of interest, colonialism, is usually reduced to a single dimension, e.g., a) a dummy variable registering the predominant colonizer of a country (La Porta et al. 1999), b) a measure of “settler” versus “extractive” colonialism (Acemoglu, Johnson & Robinson 2001), or c) a variable measuring the number of years a country was under colonial control (Grier 1999). These sorts of studies are useful, though preliminary, attempts to systematize hypotheses drawn from the case study literature. However, they scarcely exhaust the topic. Indeed, the true effects of colonialism may not be apparent from an approach that reduces the topic to one or two dimensions.
Consider the debate over the relative importance of colonial institutions (Acemoglu, Johnson & Robinson 2001, 2002; North 1981) and geography (Diamond 1997; Olsson & Hibbs 2000; Sachs & Warner 1997) in structuring long-run economic development. While the measurement of geographic factors has advanced to include a host of highly differentiated variables (e.g., climate, soil, native perennial wild grasses, disease vectors, domesticable mammals, continental axis, latitude), the measurement of colonialism has been stuck at one or two (as listed above). Consequently, it has been impossible to provide a fine-grained test of any hypothesis connected with the colonial experience.
Likewise, students of political regimes have long suspected that patterns of colonialism may strongly shape national prospects to establish and maintain democracy. For example, cross-national statistical research demonstrates a relationship between colonial status and subsequent regime history (e.g., Bernhard, Reenock & Nordstrom 2004; Bollen 1979; Bollen & Jackman 1985). However, these findings are based on the use of one or two variables: dummies for the identity of the colonial power and (in some studies) a variable measuring the number of years under colonial rule. These are not unsophisticated studies in other respects, but their measurement of the key hypothesis is strictly circumscribed.
A survey of the immense literature on colonialism thus reveals two truisms. The case study literature is informative but also un-systematic, while the crossnational literature is systematic but not very discriminating. As a consequence, the topic of colonialism suffers from simultaneous promiscuity and neglect. The subject is ubiquitous in the contemporary fields of anthropology, economics, history, political science, and sociology, but is rarely studied in a detailed and systematic manner.
Our ambition is to marry the virtues of case study and crossnational approaches so that the influence of colonialism on the modern world -- whatever that may be -- can be measured in ways that are satisfying to scholars working with in-depth historical studies as well as global datasets. Specifically, we propose to develop a comprehensive dataset focused on colonialism that will stimulate future research concerning the causes and effects of colonialism by scholars in all fields of the social sciences -- regardless of method, theoretical framework, or substantive area of interest.
In addition to activities directly tied to colonialism, we seek to gather data on other key dimensions relevant to scholars who study colonialism and long-term patterns of development. These ancillary topics fall into eleven broad categories – democracy, the state, infrastructure, geography/trade, economic organization, demography, education, religion, language, ethnicity, and slavery – and are described at length in the appendix to this document.
Without such a comprehensive historical dataset we lack the means to adjudicate among rival causal hypotheses. What has colonialism wrought? Under what circumstances might colonialism leave favorable or unfavorable legacies? Is the causal effect of colonialism to be discovered in the immense variety of colonial experiences? If so, how shall we understand these experiences, and judge their effects? How might the various eras of globalization be compared with each other? Was colonialism in the eighteenth century, for example, significantly different from colonialism in the nineteenth and twentieth centuries? These kinds of questions motivate the data collection effort of this study. We hope not only to elucidate the fraught subject of colonialism but also to shed light on long-term patterns of development, a subject usually hostage to late-twentieth century datasets.
The study also has an implicit methodological goal. It is often noted that the fields of social science and history are rent by a central cleavage separating cross-national statistical researchers, who work with global datasets drawn from the postwar era, and those in the humanities (by discipline or by persuasion), who work with historical materials drawn from a single region or a small set of countries. The project at hand attempts to bridge these two camps, integrating the salient features of in-depth historical accounts into a single global dataset that stretches back over the centuries. As such, we hope it will provide a new way of doing business in the social sciences, one that is acceptable and accessible to both qualitative and quantitative researchers.
The Dataset Problem
Given that the work of social science is increasingly global in scope it is not surprising that global datasets have played an increasingly important role in the disciplines of political science, sociology, and economics. A short list of the most important and most frequently utilized datasets in these fields would include the following: Correlates of War, ICOW Colonial History dataset, Cross-National Time-Series Data Archive, International Integrated Public Use Microdata Series (IPUMS), Penn World Tables, Polity IV, State Failure Task Force, and World Development Indicators (see Table 1). Scholars rely on these datasets -- and many others -- for a wide range of tasks. They perform roughly the same role for comparativists that standard surveys such as the National Election Study and the General Social Survey perform for Americanists.
Yet, despite the prominence of crossnational data in contemporary research, existing datasets suffer from three generic problems. First, they are limited in temporal scope. Indeed, few global datasets reach back before 1950, and we have not found a single widely used global dataset that extends into the eighteenth century. Given that work in the social sciences increasingly deals with causal and descriptive propositions that extend back to the Enlightenment or to earlier historical eras, this lack of historical coverage must be regarded as a monumental lacuna.
Second, existing datasets often suffer from ambiguity about their sources, coding procedures, and methods of aggregation. Thus, although crossnational datasets have become staples of scholarly research, it is with considerable unease that scholars employ their variables. While some of these faults are inherent to the enterprise – collecting data globally is, after all, a daunting task – others may be corrected through careful attention to coding decisions, the incorporation of multiple sources, the recording of data in disaggregated form, detailed recording of procedures, and – perhaps most important of all -- reliance on the expertise of country specialists. These methodological issues are discussed at length below.
Third, existing datasets do not address the issue of colonialism in any detail. Usually, a single dichotomous variable for principal colonizer is included (e.g., British colonial origin). Thus, our dataset would constitute the first attempt to systematically examine and record the imprint of colonialism on the modern world.
A Sample of Extant Global Datasets
Dataset |
Years |
Subjects |
Source, Location |
Notes |
Correlates of War |
1815- |
International |
Singer, Diehl (1990) |
|
Issue Correlates |
|
|
http://garnet.acns.fsu.edu/ |
The ICOW colonial history data set attempts to identify colonial or other dependency relationships for each state over the past two centuries. This includes states that have ruled each state as a colony, dependency, League of Nations mandate, UN trust territory, or other type of possession, as well as states that have seceded from existing states and states that have merged into existing states. |
Cross-National Time- |
1815- |
Comparative |
Banks (1994) |
|
International Integrated |
1960- |
Demography |
http://international.ipums. |
The world's largest collection of publicly available individual-level census data. Variables have been given consistent codes and have been documented to enable cross-national and cross-temporal comparisons. |
Penn World Tables |
1950- |
Economics |
Heston, Summers (1991) |
|
Polity IV |
1800- |
Democracy, |
Marshall, Jaggers (2002) |
|
State Failure |
1955- |
International |
Goldstone et al. (2000) |
|
World Development |
1960- |
Economics, |
World Bank (2003) |
|
Hypotheses
In order to direct the data-collection process it is necessary to establish priorities. Which descriptive patterns and causal relationships warrant attention? What sorts of evidence can be coded in a fashion that is comparable across time and through space?
Data collection is always, at least implicitly, motivated by theory. Even so, we are wary of macro-theoretical frameworks that might limit the utility of the resulting dataset for scholars working in other schools and genres. To this end, we wish to avoid an overly theoretical orientation, e.g., Marxist, world-systems, Weberian, or neoclassical. In this light, our approach is fairly close-to-the-ground. Thus, rather than a central hypothesis, we list a large number of (generally interconnected) hypotheses) in Table 2.
It should be clear that the purpose of this investigation is not solely to explore causal relationships but also descriptive patterns. Colonialism is of great intrinsic significance, influencing our views on a wide range of present-day phenomena, e.g., globalization, North/South relations, slavery, development, and what some have called “neo-imperalism.” Many of the assertions at issue in these contemporary debates concern what? questions, rather than (or in addition to) why? questions. Thus, our initial hypotheses include both causal and descriptive inferences.
It is our hope that, once completed, the data included in this project will generate new hypotheses. This, in turn, will undoubtedly stimulate further collection of data (which we expect will be integrated into the dataset). Social science is a dynamic process. But one must start somewhere. We offer the following list of hypotheses in an open-ended spirit, knowing that for each listed hypothesis there are dozens that we have not considered.
Table 2:
Hypotheses
COLONIALISM |
|
British/other |
British rule was different – more decentralized, more indirect, more democracy, and/or better governance. |
Japan/other |
Japanese rule was different from European rule – more interventionist, more developmental. |
Africa/other |
Africa was less intensively colonized than Latin America, South and Southeast Asia. |
Property rights |
Property rights were more likely to be established in colonies that attracted large numbers of settlers (AJR 2001, 2002). |
Population |
Densely settled indigenous areas were less likely (AJR), or more likely (Sokoloff & Engerman 2000), to be targets of settlement by Europeans. |
Extractable |
Regions with readily extractable resources (e.g., gold) were subject to more European settlers (a common assumption in the literature on Latin America). |
DEVELOPMENT |
|
British rule |
British colonialism, by virtue of its greater local democracy, indirect rule, and/or effective civil service, leads to greater development. |
Territorial |
Continuity of borders, or at least the endurance of a “core” region within the colony, allows for a more successful transition during the post-independence era, and hence to greater development. |
Colonial |
Greater colonial intervention causes greater (Alam 2000; Grier 1999), or lesser (Young 1994), development. |
Type of |
Directly-ruled settler colonies have the strongest developmental performance; indirectly-ruled non-settler colonies have the worst. |
Property rights |
Areas with well established property rights experienced greater subsequent development (AJR 2001, 2002; North 1981). |
Property rights and conflict |
Reification of customary norms governing access to land and property in colonial law generated conflict over interpretation and enforcement of such laws (Chanock 1998, Colson 1974). |
DEMOCRACY |
|
British rule |
British rule encouraged local- and national-level democracy, thus establishing norms and procedures that would help democracy survive in the post-independence era (Bernhard, Reenock & Nordstrom 2004; Bollen and Jackman 1985; Lipset et al., 1993; Weiner 1987). |
Direct/indirect |
Directly ruled colonies are more democratic later on because direct rule destroys traditional (and often undemocratic) power-holders (e.g., chiefs). |
Colonial |
European settlement produces democracy. |
Pre-colonial |
Experience with democratic procedures through pre-colonial legislatures helps to establish and protect democratic norms in the post-independence era. |
Constructing the Dataset
There is no simple recipe for designing and pursuing a successful data collection project. Existing crossnational datasets, discussed above,provide both exemplary models and cautionary tales. They are exemplary insofar as they manage to capture, in quantitative form, a variety of indispensable concepts commonly used in comparative analysis. They are worrisome insofar as they have often failed to provide adequate explanation of their coding procedures and are subject to important measurement errors (e.g., Munck and Verkuilen 2002).
We aim to provide a more careful and thorough collection of numbers that includes a detailed codebook with specific variable definitions, primary and secondary sources, and additional notes (where appropriate). Our intention is to remain as close to the ground as possible in our coding decisions, which is to say that aggregated concepts will be employed only in conjunction with their component (disaggregated) parts, so that future scholars can re-visit the ground that we cover.
Variables
Our hope is to identify dimensions of politics, economy, and society that are valid across time and across regions. Ideally, these measures would also be applicable to a variety of political units including empires, nation-states, city-states, colonies, and so forth. Of course, we do not imagine that data will be equally available, or equally informative, for these diverse units. The point, rather, is that the coding categories should be valid for all units that are coded.
Variables are divided into twelve general categories: 1) colonial rule, 2) democracy, 3) the state, 4) infrastructure, 5) geography/trade, 6) economic organization, 7) demography, 8) education, 9) religion, 10) language, 11) ethnicity, and 12) slavery.
A complete list of variables falling into these categories, along with their definitions and potential sources, is located in the appendix. Note that most of the following variables could be conceptualized alternately as explanatory variables, control variables, or outcome variables, depending upon one’s theoretical proposition. (Only the geographic variables are, for most intents and purposes, exogenous.)
Note also that some variables are invariant, or occur only at one period of time (e.g., at the moment of the initial colonial encounter). Others vary considerably through time, and are thus properly coded in a time-series format. Some time-series variables are available only for contemporary years. Thus, Angus Maddison’s estimates for selected countries and territories extend back to 1800, although solid annual data for a broad cross-section of countries begins in 1950. Generally, we choose to include such variables even though they cannot be extended over the entirety of our chosen time-period. (This means that the resulting panel will be “unbalanced.”)
Coding
Our intention is to collect data on “natural” units of analysis, as defined by primary and secondary sources, leaving the task of aggregation for a later stage. This means that we must deal with a wide variety of units of analysis -- empires, continents, cultural zones, nation-states, subnational regions, cities, and so forth. These units overlap and, in many instances, change over time. The British Empire in 1700 evidently refers to a very different geographic entity than the British Empire in 1800.
In order to incorporate this complexity into the basic architecture of the dataset we plan to employ GIS mapping software. Evidently, we need to find a data structure that can preserve the original units of analysis (as drawn from primary sources) while offering the possibility of multiple aggregation techniques. For example, we need to be able to reconstruct the history of present-day nation-states, whatever their previous geo-political identities might have been.
Temporal units of analysis are also complicated. Data for many of our variables are available only at very irregular intervals. A few variables are much more precisely dated. In many cases, for example, we know the exact year and day that colonial administrators assumed office (Henige 1970). Again, the purpose of the dataset is to preserve as much precision in the original data as possible. Thus, we will code each variable according to a specific year – noting a more precise date, if available, in the notes attached to that data cell.
Some data will be available in convenient numerical form (e.g., number of colonial settlers in a region). Other information will have to be estimated on the basis of historical accounts or by expert coders. The dataset thus represents a mix of “objective” and “subjective” codings, and quantitative and qualitative data. Resulting variables are of all sorts: string (ordinary language), nominal, ordinal, and interval.
For each data point (cell), the dataset will note the following: a) variable (substantive data), b) spatial unit (i.e., country, colony, region, city, or town), c) year, d) source(s), and e) additional notes. The latter is an all-purpose field allowing us to comment on the viability of the source, disagreements among sources or coders, special coding rules, or any other facet of the data point that might be relevant. This cell-by-cell information system should make the task of any future re-coding immeasurably easier and allows for a full reporting of the procedures employed.
End-Products
At the end of the project we intend to produce three related datasets, as follows:
1. Primary dataset. Includes all data collected in the course of this project in “raw” (unadjusted) form, along with any new variables that we decide to code or to create (from disaggregated data). Wherever multiple sources measure the same concept (e.g., where we have population statistics for the same unit/years from multiple sources), we will include all of the original sources as separate variables in the dataset.
Where multiple measures for a single concept are available, we will also identify a single variable that we regard as the a) “best” and b) “most complete” version of a concept. If these two desiderata are difficult to reconcile we will create two variables for that concept -- one regarded as “best” and the other as “most complete.” Techniques of imputation (to combine data for a single variable from multiple sources measuring the same concept) and interpolation (to fill in missing data for a single time-series variable) may be used for the latter. This is a time-series dataset. However, constant (time-invariant) variables, e.g., for geographic features of a territory, will also be included and will be filled in for every year in the dataset. Variables composed of cumulative totals and averages will also be created, as described below (#3). In short, the primary dataset includes all data except that which is imputed through the multiple-imputation procedure (#2).
2. Imputed dataset. In addition to the primary dataset, we will also provide a more complete dataset where missing values for all variables and years are imputed. The purpose of “complete-ness” in this context is to provide a dataset that can be used for varied analyses -- descriptive, causal, and predictive – without biasing results by over-representing those parts of the world where richer data is available. At the same time, any techniques of aggregation that involve imputation of missing data should be reflected in a corresponding measure of uncertainty for that variable. Measures of uncertainty will therefore be reported along with each variable in the final product.
This imputed dataset begins with the “most complete” variables from the primary dataset. These are included in a multiple-imputation procedure (King et al. 2001) in order to obtain a truly complete dataset covering all territories and all time-periods. (A narrowing of the sample, or of the variables or units employed, may be required in order to run this imputation procedure. We may also reconstruct the annual data as decadal or centennial in order to limit the amount of missing data.)
3. Cross-sectional dataset. This tertiary dataset incorporates the data into a cross-sectional format, centered on the year 2010. Time-series variables will be created based on cumulative totals and/or averages. For a subset of variables, this dataset will provide information in four time-periods: a) prior to colonization, b) at the height of the colonial period (during the decade of greatest colonial influence), c) at independence (the approximate year in which a country attains formal sovereignty), and d) at present (the most recent year in the dataset).
High-Order Concepts
Most of variables listed in the appendix rest at a low level of abstraction. As such, they require little elaboration. We may have difficulty measuring the population of regions, but there is little doubt about what the concept “population” means and little doubt about its cross-temporal and cross-spatial validity. Yet, in order to divine meaningful patterns through time and across space it is necessary either to invest these lower-order concepts with higher-order meaning or to construct new concepts from these lower-order variables. In either case, these high-order concepts require some explanation. In the following section, we discuss ways of aggregating up to a small but critical set of large-order concepts: colonialism, political power, economic development, human development, political development, cultural transformation, and international power.
Note that these suggestions for concept aggregation are a secondary – and entirely optional -- product of the current project. We do not mean to “legislate” particular definitions and operationalizations. Our intention, rather, is to suggest ways in which large-order concepts might be constructed from the primary-level data, and thus to indicate some of the ways in which the variables included in the appendix might be employed in constructing broader historical narratives.
Colonialism
Measuring colonial rule requires a viable definition of colonialism, a term that has been variously understood (Abernethy 2000: 19-21; Esherick et al. 2006). We adopt a broad definition here, applicable to contiguous (“land”) and non-contiguous (“overseas”) empires. A colonial relationship is established when one region establishes an institutionalized and asymmetrical system of rule over another region, which is deemed culturally distinct and inferior. This system of rule may be direct or indirect; that is, the dominant power may establish its own rules and administrative bodies, or it may rule through indigenous bodies. However, the sovereignty (over-rule) of the dominant region must be formally recognized by political leaders in the metropole and the colony or by international law (in the post-Westphalian era). Control must extend, at the very least, to foreign policymaking prerogatives and must be asymmetrical. That is, the metropole must enjoy military and political superiority over the colony and the latter must be formally (de jure) constrained to follow dictates issuing from the metropole. It follows that indigenous inhabitants of the colony possess a lesser rights and privileges than inhabitants of the metropole. A colonial relationship ends when (and if) the colony a) is fully incorporated into the metropole (it gains equal status within the larger unit) or b) gains independence (as recognized by political leaders in the metropole and the colony or by international law [in the post-Westphalian era]).
Having set forth this ideal-type definition, it is important to note that the colonial relationship tends towards greater ambiguity when applied to contiguous territories than when applied to non-contiguous territories. In the former case, each element of our definition is often pliable, causing confusion over which territories are rightly designated “colonies,” and how long this colonial relationship lasted. As an example, one might contrast Britain’s overseas colonies with its “internal” colonies (Scotland, Wales, and Ireland). The former are relatively easy to identify and to date, while the status of the latter is arguable through most periods of British history (Hechter 1975).
Additionally, some of the markers of colonialism are understandable only in the context of overseas (“far away”) relationships. For example, we can usually mark the date when an overseas territory was discovered and when the first enduring settlements were founded, while one would be at pains to code these variables for most land empires (where communication was continuous going back through recorded time).
Perhaps because of the clearly demarcated quality of overseas colonialism – its “obviousness,” both to participants and to latter-day commentators – the literature on this topic and the data sources we hope to exploit in this project are slanted toward the latter. This means, as a practical matter, that many of the variables listed below under the category “colonialism” will be easier to code for overseas empires than for land empires.
Thus, while we continue to employ the term in its general sense, we note the fact that the resulting dataset will better reflect the phenomenon of overseas colonial empires than the phenomenon of land empires. In subsequent iterations of this project, we hope to offer a fuller and more satisfactory accounting of the latter.
With respect to colonialism (contiguous and non-contiguous), we identify two fundamental dimensions: duration and intensiveness. The former is measured in years -- though should be noted that dating the onset of colonial relationships is sometimes complicated, even in the case of overseas relationships. Thus, we offer several different principles for dating the onset of a colonial relationship, as explicated in the appendix. These may be combined into a summary measure.
The intensiveness of the colonial relationship is much more complex, both conceptually and operationally. We identify a large series of variables to capture this idea: the number of staff from the metropole allocated to the colony (as a share of the colonial administration and as a share of total population in the colony), the number of military personnel from the metropole (as a share of the colonial administration and as a share of total population in the colony), number of settlers from the metropole (as share of total population in the colony), legal penetration (the number of customary court cases/total number of court cases), expenditures by the metropole for the upkeep of a colony (as a share of total exports or total population in the colony). As previously, these components may be summarized in a single variable (perhaps by use of factor analysis, if the inter-correlations are fairly high).
While these are more or less direct measures of colonial intervention, other variables in the dataset may be regarded as outcome measures, i.e., measures of what the colonial presence actually accomplished. Thus, one may compare levels of political, social, and economic development before, during, and after a colonial episode. (Strategies of causal assessment are taken up in the following section.)
Economic Development
We rely centrally on demographic variables to measure the developmental capacity of societies prior to the modern era. Two demographic variables, urbanization and population of a state’s largest city, are looked upon as proxies for aggregate societal wealth and civilizational development (including technology, the division of labor, and the development of advanced forms of social and political organization) in periods prior to the demographic revolution (Acemoglu et al. 2002; Bairoch 1988). These are also periods in which the calculation of a gross domestic product is virtually impossible, since there was no formal economy to speak of (and, to make matters even more complicated, no common measures by which purchasing power parity could be observed crossnationally). This makes demographic variables all the more essential. Fortuitously, most civilizations that we are aware of – and certainly all modern civilizations -- were based on urban agglomerations. Cities and civilization went hand in hand (Bairoch 1988; Chandler 1987; Childe 1950; Modelski 2000).
In the twentieth century, and perhaps even the nineteenth, it becomes possible to arrive at reasonably good estimates of GDP per capita (Maddison 2001), which can be combined with demographic data from earlier periods to arrive at a comprehensive accounting of societal wealth throughout the world during the period covered in this survey (1400-the present).
Another variable that may be helpful in charting economic growth within colonies and nation-states prior to the mid-twentieth century (when GDP statistics become widely available and reliable) is export revenue per capita (e.g., Manning 1982: 4).
Human Development
It is important to stress that the foregoing measures are regarded as proxies for aggregate wealth and civilizational development, not human development. Various privations associated with social inequality and oppression, of which slavery is the most egregious example, are entirely excluded from such demographic and economic variables. Indeed, it is likely that urbanization was associated with an increase in mortality rates prior to the twentieth century (Bairoch 1988). It would be folly, in any case, to equate the size of a political unit’s largest city, its level of urbanization, or its export revenue with the quality of life enjoyed by its inhabitants.
Measurements of human development are more problematic since estimates of mortality -- the most common indicator -- are available on a global scale only from the mid-twentieth century. Other indicators of health and literacy are even more limited in historical and geographic range.
For a small set of regions an available proxy for human physical wellbeing exists in the form of human stature. Stature, understood here as the average height of mature members of a specific human community, is the best measure – indeed, virtually the only measure – of human development prior to the tabulation of mortality rates (Bogin 1988; Bogin & Keep 1999; Eveleth & Tanner 1991; Komlos & Baten 1998; Steckel 1995; Steckel & Rose 2002; Steckel & Floud 1997). Human stature is highly sensitive to nutritional intake, particularly during childhood, and to insecurities in food provision that might disrupt nutritional intake. Since we have accurate measurements of stature potential – drawn from healthy specimens of present-day populations with similar genetic composition – biologists can easily calculate the degree to which previous populations achieved this potential. Particularly revelatory is the degree to which stature has varied over time in human populations around the world. Strong evidence suggests, for example, that stature declined in the initial aftermath of colonial interventions in Latin America. Granted, stature is not quite the same concept as overall mortality since, in principle, a population of tall adults might co-exist with high infant and child mortality rates (if children are dying of diseases that are not nutritionally related). However, prior to recent discoveries about disease, sanitation, and medicine we can expect that adult stature corresponded with mortality rates. Thus, we propose that stature is a useful proxy for human development through most eras covered in this study. As data on historical stature in different parts of the world becomes available, it will be integrated into the dataset and may provide a good measure of human development over the long run.
Political Development
Political development in a state – imperial, colonial, or otherwise – may be measured across several dimensions: the size of the bureaucracy (as a share of the total population), revenue (as a share of total output, exports, or population), and expenditures (as a share of total output, exports, or population).
As a summary measure of state legitimacy, we suggest counting the size of the military (as a share of the general population) or, where a separate domestic security force exists, the size of this police force (as a share of the general population). Presumably, legitimate states can sustain order with smaller investment in domestic security. Granted, domestic security is often exacerbated by internal conflicts – conflict that may have little to do with the state. Even so, insofar as groups battle openly and the state is obliged to muster armed forces, this is a problem of political legitimacy.
Cultural Transformation
Constructing accurate and sensitive measures of cultural change is perhaps the most daunting task of all. One set of proxies involves linguistic and religious practices. If these change – if, for example, a region adopts the language and/or religion of its principal colonizer – there is strong reason to suppose that a wide-ranging cultural transformation has occurred. The speed and thoroughness of this transformation can presumably be tracked by the rate and extent to which indigenous practices disappear. Thus, the variables measuring linguistic and religious practices (see appendix) offer a crude tracking of broader societal transformations.
A related approach looks to changes in the racial complexion of a population as a clue to the cultural transformation of indigenous peoples and – equally important -- how integrated/segregated these societies were, overall. Presumably, where the color line was shifting and indistinct, fewer barriers separated colonial and indigenous peoples.
These are all outcome-based approaches, to be sure. In order to get a sense of the inputs – i.e., the extent of direct cultural intervention on the part of a colonizing power – one may attempt to estimate the number of schools run by, or established by, the colonizer and the principal language of instruction in that school system(s).
International Power
The power of a state in the international arena may be measured by a) its military (permanent military personnel), b) its land area (the area over which it exercises sovereignty), and c) its population (the number of people over whom it exercises sovereignty).
Granted, it is sometimes tricky to determine questions of sovereignty (who controls which peoples within which territory). We make the simplifying assumption that a state is sovereign if its dominance over a territory is recognized by political leaders on the ground and/or by international law. We also assume that it controls peoples inhabiting that territory, even though we recognize of course that diverse peoples will be differentially responsive to directives from the center.
Methods of Analysis
Before concluding, we wish to pay explicit attention to the methods by which the data resulting from this study might be analyzed. The first anticipated use is purely descriptive, i.e., to show how countries and/or colonies fared at various points in their historical trajectory. Indeed, the most important use of this dataset may be primarily descriptive, allowing researchers to make better and broader comparisons through time and across space. This, in turn, may provide a helpful point of departure for focused case studies.
A second use of the dataset is to provide direct evidence of causal relationships. Our point of departure is the global cross-country research design, usually focused on the postwar decades (1950-). Although sometimes approached in a pooled time-series format, there is usually relatively little variation in key variables through this time period, and such variation as exists is manifestly non-experimental (and thus correlated with the error term). Complicating matters further is the extreme heterogeneity across units (nation-states). It is no wonder that the format has been strongly criticized in recent years (e.g., Kittel 2006; Rodrik 2005). Even so, for many questions of interest to scholars the crossnational regression remains among the best of all bad options.
While we take no position in this ongoing debate it is worth noting that whatever confidence one might have in cross-sectional models depends largely upon the problem of adequately specifying the model. This, in turn, rests upon the intensity and variety of specification tests that a writer is able to apply to a given hypothesis (since “correct” benchmark models are virtually impossible to identify). Only if a result is robust in the face of many plausible specifications can it be regarded as providing strong evidence of a causal hypothesis. Such specification tests evidently depend upon the prior existence of a large set of correctly measured and crossnationally valid indicators – to be used as controls or as alternate measures of the key concept of interest. In short, crossnational regressions depend upon specification tests, and specification tests depend upon data. Once again, the crucial importance of the present project becomes apparent if we are to sustain the viability of this common mode of crossnational analysis.
At the same time, one of the anticipated benefits of the Colonial Legacies project is to open up new methodological approaches. In particular, we hope to offer scholars the possibility of exploiting useful variation over time. Note that the projected dataset will collect information on key variables at annual, decadal, or centennial intervals (depending upon data availability). This means that the resulting data may be examined in a panel format over a much longer period of time, and this opens the way for what may prove, in some circumstances, a more productive use of the panel format.
Since the project will collect some data for spatial units much smaller than the contemporary nation-state (e.g., cities and regions) it may also be possible to perform analyses that are more disaggregated than the typical cross-country regression. These analyses might center on spatial units chosen according to available GIS formats, e.g., geographically circumscribed areas or hectares. The possibilities for new spatial units of analysis are, in principle, unlimited, and may greatly change our capacity to model causal relations through time.
An additional approach, also relying on temporal variation, focuses on the histories of territories whose colonial ruler changed – from Dutch to British in New York and in South Africa, for example. These cases, which are quite numerous throughout the world, offer critical evidence for any hypothesis concerning the effects of colonial rulership, for the ceteris paribus assumption of causal analysis is likely to be satisfied if the comparison is restricted to periods prior to, and after, the changeover in control.
A final methodology compares the performance of colonies in decades just prior to, and after, the achievement of independence. This approach regards independence as an exogenous “treatment,” allowing for pre- and post-tests along various dimensions of development (political, social, and economic). Of particular importance in this sort of analysis is the construction of appropriate controls for regional and global trends that might otherwise produce spurious findings. Also important is a wide-angle focus on either side of the independence divide so that temporary effects associated with this unique political rupture are neutralized.
Advancing Historical Research in the Social Sciences
Social science theories increasingly recognize that historical events and processes (i.e., events lying in the distant past) are critical for the explanation of contemporary outcomes. They are also increasingly inclined to seek global answers for significant questions and problems. Global history is here to stay. Indeed, it is increasingly recognized that most of the various processes attendant upon “globalization” are by no means novel to the twentieth century.
However, researchers in the social sciences are often unable to devise adequate empirical tests for propositions rooted in the distant past. This is not primarily the fault of the researchers. The problem is that data for prior historical periods is limited, and that which exists often suspect, requiring a deep and nuanced knowledge of a particular time-period and region. Evidence constructed in this painstaking fashion seems to resist all but the most anodyne generalizations and hence provides rather unpropitious ground for social-scientific theorizing.
We are well aware that data recovery for the fifteenth century will never match the quality and quantity of data available for the twentieth century. Yet, we are equally convinced that much more can be done to collect the data that lies out there already (in the form of secondary sources and specialized datasets), to provide new codings of substantively important topics, and to make this information more widely available to scholars. We regard this project as an important step in this direction, and one that will greatly enhance the ability of social scientists to test the propositions suggested by their increasingly historically-oriented theories. In much the same way that Polity IV, Correlates of War, and the World Development Indicators now function as standard references for the study of nation-states in the modern era, we anticipate that this new dataset may serve as the leading source of quantitative and qualitative information for those investigating periods prior to the nineteenth century – a jumping-off place for research on all conceivable topics.
Plan of Action
The project is funded for three years, beginning in Fall 2007. Over this period, we anticipate three phases of activity. In the first phase, we plan to incorporate all existing historical data relevant to colonialism and development that is relatively easy to collect, e.g., information can be drawn from existing datasets or printed sources. In the second phase, we will begin coding original data for additional variables, or adding additional data to provide more complete or more reliable coverage for existing variables. Decisions on which topics (i.e., variables) to address will be based on three general criteria: a) ease of collection, b) data reliability, and c) expected theoretical yield. In the third and final phase we will aggregate the raw data into a series of aggregate variables and sub-set datasets, as described above.
A project this expansive has no definitive point of completion. No matter how long we labor, there will always remain significant shortcomings -- in data coverage, data reliability, and theoretical scope. This is true, naturally, of all projects. But it is particularly true of a project that aims to discover patterns on a global scale and in the distant past. Our long-term objective, therefore, is to ensure that this project will be maintained – amended, emended, perhaps even fundamentally reconceptualized -- into the future. Just as other datasets (WDI, PWT, State Failure, Polity) have endured, so, we imagine, the Colonial Legacies dataset might endure. To this end, we hope to create a community of scholars who are sufficiently committed to the project that they will lend their expertise, and their time, to ensure its future.
Partial exception might be made for the Egyptian empire (Modelski 2000: 25-6) and the Roman Empire in its later stages, when wealth migrated from Rome to large latifundia-style estates situated in rural regions around the empire. However, the fact that this movement was associated with the empire’s decline is not coincidental.
References
Abernethy, David B. 2000. The Dynamics of Global Dominance: European Overseas Empires, 1415-1980. New Haven: Yale University Press.
Acemoglu, Daron; Simon Johnson; James A. Robinson. 2001. “The Colonial Origins of Comparative Development: An Empirical Investigation.” American Economic Review 91, 1369-1401.
Acemoglu, Daron; Simon Johnson; James A. Robinson. 2002. “Reversal of Fortune: Geography and Institutions in the Making of the Modern World Income Distribution.” Quarterly Journal of Economics 117, 1231-94.
Alam, M. Shahid. 2000. Poverty from the Wealth of Nations: Integration and Polarization in the Global Economy since 1760. Houndsmills, Basingstoke: Macmillan.
Armitage, David. 2000. The Ideological Origins of the British Empire. Cambridge: Cambridge University Press.
Bairoch, Paul. 1988. Cities and Economic Development: From the Dawn of History to the Present. Chicago: University of Chicago Press.
Banks, Arthur S. 1994. “Cross-National Time-Series Data Archive.” Center for Social Analysis, State University of New York at Binghamton. Binghamton, New York.
Benjamin, Thomas (ed). 2006. Encyclopedia of Western Colonialism since 1450. New York: Macmillan.
Bergesen, Albert; Ronald Schoenberg. 1980. “Long Waves of Colonial Expansion and Contraction 1415-1970.” In Albert Bergesen (ed), Studies of the Modern World System (New York: Academic Press) 231-78.
Bernhard, Michael, Christopher Reenock, Timothy Nordstrom. 2004. “The Legacy of Western Overseas Colonialism on Democratic Survival.” International Studies Quarterly 48, 225-50.
Bogin, Barry. 1988. Patterns of Human Growth. Cambridge: Cambridge University Press.
Bogin, Barry; R. Keep. 1999. “Eight Thousand Years of Economic and Political History in Latin America Revealed by Anthropometry.” Annals of Human Biology 26:4, 333-51.
Bollen, Kenneth A. 1979. “Political Democracy and the Timing of Development.” American Sociological Review 44: 572-87.
Bollen, Kenneth A.; Robert W. Jackman. 1985. “Political Democracy and the Size Distribution of Income.” American Sociological Review 54: 612-21.
Boswell, Terry. 1989. “Colonial Empires and the Capitalist World-Economy: A Time Series Analysis of Colonization, 1640-1960.” American Sociological Review 54:2 (April) 180-96.
Braibanti, Ralph (ed). 1966. Asian Bureaucratic Systems Emergent from the British Imperial Tradition. Durham: Duke University Press.
Brown, David S. 2000. “Democracy, Colonization, and Human Capital in Sub-Saharan Africa.” Studies in Comparative International Development 35:1, 20-40.
Brown, Michael E. 1997. “The Impact of Government Policies on Ethnic Relations.” In Michael E. Brown and Sumit Ganguly (eds), Government Policies and Ethnic Relations in Asia and the Pacific (Cambridge: MIT Press).
Carlson, W. Bernard. 2005. Technology in World History, 7 vols. Oxford: Oxford University Press.
Carneiro, Robert L. 1970. “A Theory of the Origin of the State.” Science 169, 733-38.
Chamberlain, Muriel Evelyn (ed). 1998. The Longman Companion to European Decolonisation in the Twentieth Century. Addison-Wesley.
Chandler, Tertius. 1987. Four Thousand Years of Urban Growth: An Historical Census. Lewiston, NY: St. David’s University Press.
Chase-Dunn, Christopher; Thomas Reifer. 2003. The Social Foundations of Global Conflict and Cooperation: Waves of Globalization and Global Elite Integration Since 1840. NSF Grant, University of California-Riverside.
Childe, Gordon V. 1950. “The Urban Revolution.” Town Planning Review 21:1, 3-17.
Clark, Grover. 1936. The Balance Sheets of Imperialism: Facts and Figures on Colonies. New York: Columbia University Press.
Cohen, Ronald; Elman R. Service (eds). 1978. Origins of the State: The Anthropology of Political Evolution. Philadelphia: Institute for the Study of Human Issues.
Curtin, Philip D. 1989. Death by Migration: Europe’s Encounter with the Tropical World in the Nineteenth Century. Cambridge: Cambridge University Press.
Diamond, Jared. 1997. Guns, Germs, and Steel: The Fates of Human Societies. New York: Norton.
Eggimann, Gilbert. 1999. La Population des villes des Tiers-Mondes, 1500-1950. Geneva: Centre d’histoire economique Internationale de l’Universite de Geneve, Libraire Droz.
Eisenstadt, S.N.; Stein Rokkan (eds). 1973. Building Nations and States: Models and Data Resources, vol. 1. Beverly Hills, CA: Sage.
Englebert, Pierre. 2000. State Legitimacy and Development in Africa. Boulder: Lynne Rienner.
Esherick, Joseph W.; Hasan Kayali; Eric van Young (eds). 2006. Empire to Nation: Historical Perspectives on the Making of the Modern World. Lanham, MD: Rowman & Littlefield.
Etemad, Bouda. 2000. Possession du monde: Poids et mesures de la colonisation (XVIIIe-XXe Siecles). Bruxelles: Editions complexes.
Eveleth, Phyllis B., James M. Tanner. 1991. Worldwide Variation in Human Growth, 2d ed. Cambridge: Cambridge University Press.
Fieldhouse, D.K. 1966. The Colonial Empires: A Comparative Study from the Eighteenth Century. London: Macmillan.
Frankema, Ewout. 2006. “The Colonial Origins of Inequality: Exploring the Causes and Consequences of Land Distribution.” Research Memorandum GD-81, Groningen Growth and Development Centre.
Goldstone, Jack A. et al. 2000. “State Failure Task Force Report: Phase III Findings.” [Available at http://www.cidcm.umd.edu/inscr/stfail/SFTF%20Phase%20III%20Report%20Final.pdf]
Grier, Robin M. 1999. “Colonial Legacies and Economic Growth.” Public Choice 98:317-335.
Hailey, Lord (W.M.). 1945. An African Survey: A Study of Problems Arising in Africa South of the Sahara. London: Oxford University Press.
Hailey, Lord (W.M.). 1957. An African Survey: A Study of Problems Arising in Africa South of the Sahara, Revised ed. London: Oxford University Press.
Hailey, Lord (W.M.). 1979. Native Administration in the British African Territories, 5 vols. Colonial Office.
Hechter, Michael. 1975. Internal Colonialism: The Celtic Fringe in British National Development, 1536-1966. Berkeley: University of California Press.
Henige, David P. 1970. Colonial Governors from the Fifteenth Century to the Present. Madison, WI: University of Wisconsin Press.
Hensel, Paul. [various years]. “ICOW Colonial History Data Set.” http://garnet.acns.fsu.edu/~phensel/
Herbst, Jeffrey. 2000. States and Power in Africa: Comparative Lessons in Authority and Control. Princeton: Princeton University Press.
Heston, Alan; Robert Summers. 1991. “The Penn World Table (Mark 5): An Expanded Set of International Comparisons, 1950-1988.” Quarterly Journal of Economics (May) 327-68.
Jacobson, Harold K. 1968. “United Nations and Colonialism, 1946-1967.” ICPSR Study No. 5513.
Jones, Eric L. 1981. The European Miracle: Environments, Economics and Geopolitics in the History of Europe and Asia, 2d ed. Cambridge: Cambridge University Press.
King, Gary; James Honaker; Anne Joseph; Kenneth Scheve. 2001. “Analyzing Incomplete Political Science Data: An Alternative Algorithm for Multiple Impuation.” American Political Science Review 95:1 (March) 49-69.
Kittel, Bernhard. 2006. “A Crazy Methodology?: On the Limits of Macroquantitative Social Science Research.” International Sociology 21, 647-77.
Klein, Herbert S. 1998. The American Finances of the Spanish Empire: Royal Income and Expenditures in Colonial Mexico, Peru, and Bolivia, 1680-1809. University of New Mexico Press.
Kohli, Atul. 2004. State-Directed Development: Political Power and Industrialization in the Global Periphery. Cambridge: Cambridge University Press.
Komlos, John; Joerg Baten (eds). 1998. The Biological Standard of Living in Comparative Perspective. Stuttgart: Franz Steiner Verlag.
Kuczynski, R.R. 1948. Demographic Survey of the British Colonial Empire, vol 1 London: Oxford University Press.
Kuczynski, R.R. 1949. Demographic Survey of the British Colonial Empire, vol 2 London: Oxford University Press.
Kuczynski, R.R. 1953. Demographic Survey of the British Colonial Empire, vol 3 London: Oxford University Press.
Kurzman, Charles; Erin Leahey. 2004. “Intellectuals and Democratization, 1904-1912 and 1988-1996” American Journal of Sociology 109:4.
Lange, Matthew. 2003. “The British Colonial Lineages of Despotism and Development.” Dissertation, Department of Sociology, Brown University.
La Porta, Rafael; Florencio Lopez-de-Silanes; Andrei Shleiferand Robert W. Vishny. 1998. “Law and Finance.” Journal of Political Economy 106:6.
La Porta, Rafael; Florencio Lopez-de-Silanes; Andrei Shleifer; Robert W. Vishny. 1999. “The Quality of Government.” Journal of Economics, Law and Organization 15:1, 222-79.
Lipset, S. M.; K. Seong; J. C. Torres. 1993. “A Comparative Analysis of the Social Requisites of Democracy.” International Social Science Journal 136: 155-75.
Macfarlane, Alan. 1978. The Origins of English Individualism: The Family, Property, and Social Transition. Cambridge: Cambridge University Press.
Maddison, Angus. 2001. The World Economy: A Millennial Perspective. Paris: OECD.
Mahoney, James. 2003. “Long-Run Development and the Legacy of Colonialism in Spanish America.” American Journal of Sociology 109:1.
Mamdani, Mahmood. 1996. Citizen and Subject: Decentralized Despotism and the Legacy of Late Colonialism. Oxford: Oxford University Press.
Manning, Patrick. 1982. Slavery, Colonialism and Economic Growth in Dahomey, 1640-1960. Cambridge: Cambridge University Press.
Marshall, Monty G.; Keith Jaggers. 2002. “Polity IV Project: Political Regime Characteristics and Transitions, 1800-2002.” Manuscript. [Available at http://www.cidcm.umd.edu/inscr/polity/]
Masters, William A., Margaret S. McMillan. 2000. “Climate and Scale in Economic Growth.” Journal of Economic Growth 6, 167-86.
McEvedy, Colin; Richard Jones. 1978. Atlas of World Population History. New York: Facts on File.
Mitchell, Brian R. 2003a. International Historical Statistics: Africa, Asia and Oceania, 1750-1993, 3d ed. London: Macmillan.
Mitchell, Brian R. 2003b. International Historical Statistics: The Americas, 1750-2000, 5th ed. London: Macmillan.
Mitchell, Brian R. 2003c. International Historical Statistics: Europe, 1750-1993, 4th ed. London: Macmillan.
Mitchell, George P. 1967. Ethnographic Atlas. Pittsburgh: University of Pittsburgh Press.
Modelski, George. 2000. World Cities: -300 to 2000. Washington: Faros.
Munck, Gerardo L., Jay Verkuilen. 2002. “Conceptualizing and Measuring Democracy: Evaluating Alternative Indices.” Comparative Political Studies, 35:1 (Feb): 5-34.
North, Douglas C. 1981. Structure and Change in Economic History. New York: Norton.
Olsson, Ola; Douglas A. Hibbs, Jr. 2000. “Biogeography and Long-Run Economic Development.” Working Papers in Economics No. 26 (August), Department of Economics, Goteborg University.
Pearson, David. 2001. The Politics of Ethnicity in Settler Societies. Basingstoke: Palgrave.
Pierson, Paul; Theda Skocpol. 2002. “Historical Institutionalism in Contemporary Political Science.” In Ira Katznelson, Helen V. Milner (eds), Political Science: The State of the Discipline (New York: W. W. Norton) 693-721.
Posner, Daniel. 1999. “The Colonial Origins of Ethnic Cleavages: The Case of Linguistic Divisions in Zambia,” presented at the Spring 2001 Meeting of the Laboratory in Comparative Ethnic Processes (LiCEP), Harvard University, 23 March 2001; and the James S. Coleman African Studies Center Seminar, UCLA.
Posner, Daniel. 2000. “Measuring Ethnic Identities and Attitudes Regarding Inter-group Relations: Methodological Pitfalls and a New Technique,” presented at the Fall 2000 Meeting of the Laboratory in Comparative Ethnic Processes (LiCEP), University of Pennsylvania.
Putterman, Louis. 2003. “State Antiquity Index (‘Statehist’), Version 2.” See http://www.econ.brown.edu/fac/Louis_Putterman/
Rodrik, Dani. 2005. “Why We Learn Nothing from Regressing Economic Growth on Policies.” Ms.
Sachs, Jeffrey D.; Andrew Warner. 1997. “Fundamental Sources of Long-run Growth.” American Economic Review 87:2, 184-8.
Singer, J. David; Paul Diehl (eds). 1990. Measuring the Correlates of War. Ann Arbor: University of Michigan Press.
Sokoloff, Kenneth L.; Stanley L. Engerman. 2000. “Institutions, Factor Endowments, and Paths of Development in the New World.” Journal of Economic Perspectives 14:3 (Summer) 217-32.
Steckel, Richard H. 1995. “Stature and the Standard of Living.” Journal of Economic Literature 33 (December) 1903-40.
Steckel, Richard H.; Jerome C. Rose (eds). 2002. The Backbone of History. Cambridge: Cambridge University Press.
Steckel, Richard H.; Roderick Floud (eds). 1997. Health and Welfare during Industrialization. Chicago: University of Chicago Press.
Stewart, John. 1996. The British Empire: An Encyclopedia of the Crown's Holdings, 1493 through 1995. McFarland & Co.
Stewart, John. 1999. African States and Rulers, 2d ed. Jefferson, NC: McFarland & Company.
Strang, David. 1990. “From Dependency to Sovereignty: An Event History Analysis of Decolonization, 1870-1987.” American Sociological Review 55 (December) 846-60.
Strang, David. 1991. “Global Patterns of Decolonization, 1500-1987.” International Studies Quarterly 35:4 (December) 429-54.
Thorp, Rosemary. 1998. Progress, Poverty, and Exclusion: An Economic History of Latin America in the 20th Century. New York: Inter-American Development Bank.
TePaske, John J.; Herbert S. Klein. 1982. The Royal Treasuries of the Spanish Empire in America. Durham, N.C.: Duke University Press.
United Nations Educational, Social and Cultural Organization. 1957. World Illiteracy at Mid-Century, a Statistical Study. Paris: UNESCO.
Weiner, Myron. 1987. “Empirical Democratic Theory.” In Myron Weiner & E. Ozbudun (eds), Competitive Elections in Developing Countries (Durham: Duke University Press) 3-34.
Wilkinson, Steven I. In process. Colonization, Institutions and Conflict. Book manuscript.
Woodberry, Robert D. In process. Book manuscript on colonial missionaries.
World Bank. 2003. World Development Indicators 2003. Washington, DC: International Bank for Reconstruction and Development.
Young, Crawford. 1994. The African Colonial State in Comparative Perspective. New Haven: Yale University Press.
Appendix:
Variables and Sources
Sources listed below refer to primary sources of data (i.e., published work or publicly available datasets) or, in some cases, to works that offer discussions or examples of the hypothesis that a variable represents. In addition, we wish to acknowledge certain general sources, of use for a wide range of indicators. These include: Benjamin (2006), Carlson (2005), Chase-Dunn & Reifer (2003), Clark (1936), Correlates of War dataset (Singer & Diehl 1990), Cross-National Time-Series Data Archive (Banks 1994), Eisenstadt & Rokkan (1973), Etemad (2000), Frankema (2006), Hailey (1945, 1957), Henige (1970), Hensel (various years), Jacobson (1968), Kuczynski (1953), The Making of the Modern World (digital facsimile of 61,000 works of literature on economics and business from 1450 through 1850; combines the Kress Collection of Business and Economics at the Baker Library, Harvard Business School and the Goldsmiths’ Library of Economic Literature at the University of London Library; coverage: 1450-1850), Mitchell (1998a, 1998b, 1998c), Penn World Tables (Heston & Summers 1991), Polity IV dataset (Marshall & Jaggers 2002), State Failure Task Force dataset (Goldstone et al. 2000), Statesman’s Yearbook (various years), Stewart (1996, 1999), Wilkinson (in process), Woodberry (in process), World Development Indicators (“WDI” [various years]). These will be carefully culled for additional data.
For each coding, it will be necessary to assign a spatial unit – empire, country, colony, region, city, and so forth. This should follow the designation of the original source as closely as possible, unless there are reasons to assign a different coding (e.g., to retain consistent usage in the meaning of a place-name or to conform to a more reliable spatial designation than is contained in the original source).
The principal temporal unit of analysis is the territory-year. However, more precise dates (e.g., for an election) should also be noted (in the “Notes” section for each data cell), wherever available.
Most of the following variables apply to all spatial units, while some pertain only to colonies, to sovereign territories, to certain parts of the world, or to small but as yet undefined units (the core units of analysis in the GIS mapping system). Where these spatial references are not apparent, we have noted them. Coders should note wherever a variable is not applicable (we need a coding procedure for this!). We don’t want to impute missing data through Amelia for these cases and we want to be able to accurately report the amount of missing data in the sample (which should not include cases where the concept is not applicable).
“Composite” variables refer to those that are constructed from other variables in the dataset.
Variables in bold are identified as crucial to our project. They will be given first priority. (Important composite variables are often composed from these primary variables; these are not in bold.)
Sources such as the WDI which cover only the contemporary period are employed for the purpose of providing contemporary data for historical time-series (garnered from other sources). As a logistical matter, the inclusion of these sources should be our last step so that we can ensure obtaining the most recent data (since these sources tend to be revised and updated annually).
Note that variable definitions often must follow those employed by our original sources. This means that a good deal of redefinition will be necessary in the following set of variables. Alternatively, we may have to collect several variables measuring approximately the same concept, but coded (by primary or secondary sources) in slightly different ways. For example, the threshold for consideration as an “urban” unit in estimates of urbanization varies.
I. Colonial Rule
First European explorers. Coding: date of arrival. Sources: Woodberry (in process).
First European missionaries. Coding: date of arrival. Sources: Woodberry (in process).
First European trading post. Coding: date of establishment of first long-term trading post.
First permanent officials. Coding: date of arrival of permanent officials from the metropole.
International status.Refers to recognized status in international (read: European) law. Coding: 1) independent, 2) colony. “Colony” includes various non-sovereign statuses such as protectorate, dominion, and territory, where the affected territory is controlled by a superior power but is not a full member of the latter’s state or empire. It does not include countries that are self-governing and have effective control over their own foreign policy. Note that the coding of this variable could be endlessly disaggregated. Stewart (1996: 1-3) suggests fifteen for the British Empire alone: home entity (metropole), semi-independent countries, dominions, India, territories with less self-government than dominions, crown colonies, colonies, protectorates, dependencies of dominions, dependences of colonies, condominiums, territories, League of Nations Mandates, United Nations Mandates, and spheres of influence. Sources: Abernethy (2000: appendix), Bergesen & Schoenberg (1980), Boswell (1989), Clark (1936), Putterman (2003), Strang (1990, 1991). Note: variables constructed by Bergesen & Schoenberg (1980), Boswell (1989), and Strang (1990, 1991) are all apparently based on Henige (1970).
Identity of colonizer. Colonizer is defined as above. Coding: string. Sources: Clark (1936), Henige (1970), Strang (1990, 1991).
Effective autonomy from European powers. Coding: 1) none or very little (metropole exerts control over most aspects of domestic and foreign policy), 2) some control over domestic policies but not foreign policies (the metropole may possess veto rights over domestic policy but this is rarely employed or, if employed, is rarely sustained), 3) minimal direct interference on the part of European powers (however, there may be occasional meddling in issues of concern to the metropole and there may also be a good deal of “anticipatory” behavior on the part of the colony). Refers only to overseas empires.
Colonial military personnel (#). Coding: number of troops from the metropole (or in the service of the metropole). Source: Ray (in process).
Status within the empire. Coding: 1) colonial center, 2) semi-periphery, 3) periphery. These categories are aggregated from data on number of settlers and from qualitative observations about extent of institutional implantation. Source: Mahoney (2003). [Note: This may be redundant if we have decent figures on colonial administrators.]
Territory under foreign control. Coding: percent of territory under foreign rule. Sources: Putterman (2003).
Legal penetration. Number of customary court cases/total number of court cases. Sources: Lange (2003).
Colonial administrators (#). Coding: number of colonial administrators. The latter is defined as someone who holds a permanent position in the colonial civil service and was born in the metropole or whose parents were born in the metropole. Sources: British Blue books.
Colonial administrators (% of administration). Coding: number of colonial administrators as percent of total administration. Sources: Braibanti (1966: 645-7).
Colonial administrators (% of pop). Coding: number of colonial administrators as percent of total population. Composite variable.
Colonial expenditure. Expenditures by the metropole for the upkeep of a colony – all-inclusive (e.g., infrastructure, military, social policies, and administration). Coding: Monetary value in metropole’s currency. Sources: Klein (1998), TePaski & Klein (1982).
Colonial profit/loss. Net profits/loss for the metropole from a colony. Sources: Klein (1998), TePaski & Klein (1982).
II. Democracy
Sources: Lawson (in process), Wilkinson (in process).
Elections. Coding: 1) no election, 2) an election.
Type of election. Coding: 1) metropole (election in colony for an office in the metropole), 2) national or colony-wide, 3) subnational (regional or local).
Suffrage (%). Coding: percent of permanent residents who are permitted (de facto and de jure) to vote.
Turnout. Coding: percent of permanent residents who vote.
III. The State
Gov revenue (#). Coding: central government tax revenue. Sources: Mitchell (1998a, 1998b, 1998c), WDI.
Gov revenue (per cap). Coding: central government revenue per capita. Composite variable.
Gov expenditure (#). Coding: central government expenditure. Sources: Mitchell (1998a, 1998b, 1998c), WDI.
Gov expenditure (per cap). Coding: central government expenditure per capita. Composite variable.
Bureaucrats (#). Coding: size of bureaucracy measured as the number of persons in regular employ, non-military only.
Bureaucrats (%). Coding: size of bureaucracy measured as the number of persons in regular employ, non-military only, as share of population. Composite variable.
Military personnel (#). Coding: number of military personnel.
Military personnel (%). Coding: number of military personnel as share of population. Composite variable.
Police (#). Coding: number of police, including all law enforcement and security officials whose jobs focus on the home front.
Police (%). Coding: number of police as share of population. Composite variable.
IV. Infrastructure
Source: Christopher Housenick (in process).
Waterways. Percent of territory reachable by navigable waterway. Sources: AJR (2002).
Roads (length). Coding: total length of all major highways (kilometers). Sources: Herbst (2000: 84-), WDI.
Railroads (length). Coding: total length of railway lines (kilometers). Sources: Mitchell (1998a, 1998b, 1998c).
Railroads (freight). Coding: total freight traffic on railways (thousand metric tons). Sources: Mitchell (1998a, 1998b, 1998c).
Railroads (traffic). Coding: passenger traffic on railways (thousands). Sources: Mitchell (1998a, 1998b, 1998c).
Post (traffic). Coding: mail items (millions). Sources: Mitchell (1998a, 1998b, 1998c).
Telegraph (traffic). Coding: telegrams (millions). Sources: Mitchell (1998a, 1998b, 1998c).
Telephones (in use). Coding: telephones in use. Sources: Mitchell (1998a, 1998b, 1998c).
V. Geography, Trade
Total land area. Coding: square kilometers. Sources: historical maps (integrated through GIS software), WDI.
Total arable land area. Source: McEvedy and Jones (1978).
Connectedness (internal). Coding: percent of population who are one day’s journey away or less (using means of transport available to persons of moderate means) from a navigable river, airport, or ocean port.
Distance from metropole. Coding: miles from colonial capital. Applicable only to colonies.
Travel time to and from metropole. Coding: shortest time required for a round-trip between metropole and colony. (Note: a trip from A to B may require more or less time than a trip from B to A because of varying tidal and wind factors.) Applicable only to colonies.
Freight shipping time to and from metropole. Coding: shortest time required for a round-trip -- bearing commodities -- between metropole and colony. Applicable only to colonies.
Global trade centrality (land). Estimated mean round-trip freight shipping time to all regions of the world (except Antarctica), with existing technology and infrastructure. A “region” may be defined in whatever way is most convenient for the analysis, so long as each region is approximately the same surface (land) area. (Note: GIS will calculate this if we can provide estimates of shipping time across water and overland routes.)
Global trade centrality (pop). As above, but each region is weighted by its population so that more populous regions count for more. Composite variable.
Export revenue (raw). Coding: export revenue. Sources: Manning (1982: 4).
Export revenue (per capita). Composite variable.
Trade openness (quantitative). Coding: Imports and exports as share of GDP. This includes all trade beyond the official borders of a particular unit (whatever that may be, e.g., metropole, empire, colony, nation-state). Thus, trade between the metropole and its colony is understood as foreign trade for both units; it is not, however, foreign trade if one is the larger empire, encompassing both, is the unit of analysis. Sources: Clark (1936), Mitchell (1998a, 1998b, 1998c), WDI.
Trading restrictions. Coding: area
1) all foreign trade expressly forbidden and strictly enforced, 2) trade permitted only through a few entrepot trading centers and strictly limited, 3) trade within the confines of the empire, strictly enforced, 4) trade within the confines of an empire, not strictly enforced, 5) trade with all parties allowed.
VI. Economic Organization
GDP per capita. Sources: Maddison (2001), Mitchell (1998a, 1998b, 1998c), Thorp (1998), WDI.
Labor force, Subsistence. Coding: percent of labor force engaged in hunting, gathering and planting for consumption only (non-commercial).
Labor force, Plantation. Coding: percent of labor force engaged in plantation agriculture (commercial).
Labor force, Agriculture. Coding: percent of labor force engaged in agriculture (any variety).
Agriculture as % of GDP. Sources: Mitchell (1998a, 1998b, 1998c), WDI.
Mineral extraction. Coding: 1) no or little precious mineral extraction, 2) some precious mineral extraction, 3) substantial precious mineral extraction.
VII. Demography
Settler mortality. Sources: Acemoglu, et al. (2001, 2002), Curtin (1989).
Mugar HB 885 K8 1969
Colonials (#). A “colonial” refers to all persons who hail from the metropole or who are generally regarded as being of that stock. Sources: Acemoglu et al. (2002), Clark (1936), Kuczynski (1953), McEvedy and Jones (1978). Coders: make sure to specify the actual nationality of the colonials in the Notes section so that there is no subsequent confusion about this.
Mugar HB 885 K8 1969
Colonials (%). Coding: Colonials as percent of total population. Composite variable.
Europeans (#). A “European” refers to all persons who are considered to be (i.e., generally regarded) of European stock. Sources: Acemoglu et al. (2002), Clark (1936), Kuczynski (1953), McEvedy and Jones (1978).
Indigenous people (#). Defined as people who trace ancestry to pre-colonial period.
Mixed/Mestizo people (#). Defined as offspring of liaisons between Europeans and non-Europeans who are understood to be such.
Other people (#). Residual category for non-European, non-indigenous, and non-mixed people.
Europeans/Japanese, Indigenous, and Mixed people per unit of arable land. Composite variable. [Need a shorter title for this variable.]
Population. Note: population statistics will be collected for all manner of units – nations, empires, colonies, cities, et al. Sources: Chandler (1987), Clark (1936), Kuczynski (1953), Maddison (2001), Manning (dataset in progress will cover Africa), McEvedy and Jones (1978), Mitchell (1998a, 1998b, 1998c), Modelski (2000), WDI.
Mugar HB 885 K8 1969;
R.R. Kuzynski, The Cameroons and Togoland: A Demographic Study, 1939. “Collection of population statistics and to the demographic situation of an African area from the beginning of its colonization up to the present time.”
Population density. Sources: Kuczynski (1953), WDI. Composite variable.
Population of largest city. Coding: population of the largest city extant within some larger unit (e.g., nation-state, colony, empire). Composite variable.
Urbanization. Coding: percent of population living in urban areas. Note: various definitions of “urban” are employed by different sources; these will be collected as separate variables and then combined in some fashion. Sources: Acemoglu et al. (2002a), Bairoch (1988), Eggimann (1999), Maddison (2001), Mitchell (1998a, 1998b, 1998c), WDI.
Stature. Coding: estimated mean adult height. Sources: Bogin (1988), Bogin and Keep (1999), Eveleth and Tanner (1991), Komlos and Baten 1998), Steckel (1995), Steckel and Rose (2002), Steckel and Floud (1997).
Birth rate. Coding: annual childbirths/population*1000. Sources: Kuczynski (1953), Maddison (2001), Mitchell (1998a, 1998b, 1998c), WDI.
Infant mortality. Coding: number of deaths of children less than 1 year old annually per thousand live births. Sources: Kuczynski (1953), Mitchell (1998a, 1998b, 1998c), WDI.
Mugar HB 885 K8 1969
Mortality rate (crude). Coding: total deaths annually/1000. Sources: Kuczynski (1953), Maddison (2001), Mitchell (1998a, 1998b, 1998c), WDI.
Mugar HB 885 K8 1969
Life expectancy. Coding: life expectancy at birth, as estimated from life tables. Sources: Maddison (2001), WDI.
VIII. Education
General source: Woodberry (in process).
Literacy. Coding: the percentage of people aged 15+ who can, with understanding, both read and write a short, simple statement on their everyday life. (Coded as zero if there is no generally recognized written language.) Sources: Eisenstadt & Rokkan (1973: 245-47), United Nations Educational, Social and Cultural Organization (1957), WDI.
Schooling (primary/secondary). Coding: the number of children in school as percent of school-age population (or, where unavailable, of general population). Sources: Mitchell (1998a, 1998b, 1998c), PWT, WDI.
Schooling (university). Coding: the number of university students as percent of total population. Sources: Mitchell (1998a, 1998b, 1998c), PWT, WDI.
Schooling (overseas). Coding: the number of persons educated (for some period of time) in a university located in a metropole. Sources: Kurzman & Leahey (2004).
Schooling (system type). Coding: public secular (%), public denominational (%), private secular (%), private denominational (%).
IX. Religion
General source: Woodberry (in process).
Missionaries (#). Coding: for each missionary group, the approximate number of missionaries that they had in the field.
Religion (%). Coding: list all major religions and the approximate percent of the population who adhered to, or were born into, each.
Religion (geo). Coding: major religions and the territory in which their adherents lived. A “major” religion refers to the first or second most commonly practiced religion in a region. Sources: historical maps.
Religions (#). Coding: number of major religions. Composite variable (from GIS mapping).
Religion (% area). Coding: percent of territory where a particular religion is practiced. Composite variable (from GIS mapping).
Religion (% area of dominant religion). Coding: percent of territory where the dominant religion is practiced. Composite variable (from GIS mapping).
X. Language
Language. Coding: list all major languages and the approximate percent of the population who are speakers of each. Numbers may exceed 100, since some will have multiple competencies.
Language (geo). Coding: show all major languages and the territory in which their speakers lived. A “major” language is one that is understood by a majority of inhabitants in a given territory. Sources: historical maps.
Languages (#). Coding: number of major languages. Composite variable (from GIS mapping).
Language (% area). Coding: percent of territory where a particular language is practiced. Composite variable (from GIS mapping).
Language (% area of largest language). Coding: percent of territory encompassed by the major language with the largest spatial coverage. Composite variable (from GIS mapping).
Linguistic distance. Coding: distance between major languages, as understood by linguists. See recent work by Fearon and Laitin.
XI. Ethnicity
Ethnicity (geo). Coding: show all major ethnicities and the territory in which their adherents lived. Sources: historical maps.
Ethnic groups (#). Coding: number of major ethnic groups. Composite variable (from GIS mapping).
Ethnicity (% area). Coding: percent of territory where a particular language is practiced. Composite variable (from GIS mapping).
Ethnicity (% area of largest ethnicity). Coding: percent of territory encompassed by the ethnic group with the largest spatial coverage. Composite variable (from GIS mapping).
XII. Slavery
Note: Patrick Manning will offer advice on sources and coding of this section. See various bibliographies compiled by Joseph C. Miller.
Slavery (#). Coding: number of slaves.
Slavery (%). Coding: number of slaves as percent of population. Composite varable.
Slavery (qual). Coding: 1) no or very little slavery or slavetrading, 2) extensive slave-trading but only a small permanent slave population, 3) extensive slave population.