polaR Package
Accompanying the encyclopedia, our R package polaR makes all our code available online and lets other researchers reproduce and test our knowledge on polarization more easily. Those who just want to access the data or gain an overview of polarization trends over time can also just look at our visualization tool.
Introduction & Installation
In this Notebook, you can find the general use case of the polaR package. The package is a work in progress and will be updated and expanded over time. We very much welcome any kind of feedback and suggestions for improvement.
Right now, it is only available on GitLab. While we are working on polishing the package and making it available on CRAN, you can install it from here using devtools:
devtools::install_gitlab("felixgruenewald/polaR")
Workflow
The package is designed to work with a range of different datasets. It is built around the idea of importing data from different sources, and computing a range of different measures on these datasets. The package is designed to be as user-friendly as possible, and to work with a range of different datasets but some measures are only available for specific data structures and variables, so the mileage varies depending on the use case.
In general, the workflow is as follows:
- Import data from a locally downloaded dataset using
polaR_import()
- Compute measures on the dataset using the respective functions. You can find a list of the functions and their compatibilities below.
Importing Data
Different datasets use different variable names and scales for the same items. The package therefore contains an import function that takes a range of datasets and recodes names, scales and missing values into a consistent pattern on the base of which the measures can be computed. As of now, it only works with .dta files.
For this, a pre-defined dictionary containing all relevant variables is used to look up the variable names for the dataset specified under the source
option, and transforms them accordingly.
As of now, the following datasets are available:
Dataset | source |
---|---|
CSES IMD | cses_imd |
CSES 5 | cses_5 |
ESS | ess |
CHES | ches |
MARPOR | cmp |
Eurobarometer | eb |
polaR::var_dict
shows the renaming pattern and variables used for this. You can load and adapt or add to the dictionary to increase the number of variables or datasets compatible with the package. With the keep_all
option, you can opt between importing all variables in the dataset, or just those that are relevant to the computation of measures.
cses <- polaR_import(source = "cses_imd", path = "path/to/cses_imd.dta", keep_all = T)
ess <- polaR_import(source = "ess", path = "path/to/ess.dta", keep_all = F)
Note: The ESS is a difficult case, as the waves are available individually. Some of these waves do not contain the relevant variables for the polaR functions and therefore, the measures won’t compute on these datasets. To generate the data from the encyclopedia, we merged the 10 waves together:
ess1 <- polaR_import(source = "ess", path = "path/to/ess1.dta", keep_all = F)
ess2 <- polaR_import(source = "ess", path = "path/to/ess2.dta", keep_all = F)
...
ess10 <- polaR_import(source = "ess", path = "path/to/ess10.dta", keep_all = F)
ess <- rbind(ess1, ess2, ess3, ess4, ess5, ess6, ess7, ess8, ess9, ess10)
rm(ess1, ess2, ess3, ess4, ess5, ess6, ess7, ess8, ess9, ess10)
Measures
polaR
includes a range of different measures that can be computed with various datasets. Some measures, like the CSES polarization index, are linked to a specific dataset. Others, like the standard deviation of issue self-placements of respondents or spread of party positions, can be computed with different data sources. Wherever it is possible to compute a measure, the package offers the possibility to do so.
sd_mass <- sd_mass(dataset = cses, issue = "leftright")
sd_mass <- sd_mass(dataset = ess, issue = "leftright")
The table below gives a quick overview on which measures are currently compatible with which datasets.
Measure | Function | CSES (5 & IMD) | CHES | Eurobarometer | ESS | MARPOR |
---|---|---|---|---|---|---|
API | api() | ✅ | ||||
CSES Polarization Index | cpi() | ✅ | ||||
Dispersion | dispersion() | ✅ | ||||
Distance | distance() | ✅ | ||||
Range | range_parties() | ✅ | ||||
SD (Mass Attitudes) | sd_mass() | ✅ | ✅ | ✅ | ||
SD (Experts) | sd_experts() | ✅ | ✅ | ✅ | ||
SD (Party Perceptions) | sd_partyperception() | ✅ | ||||
Spread | spread_likedislike() | ✅ | ||||
Variance | variance() | ✅ |
Individual & Aggregate Level
Some measures work on the individual level, i.e., they add an additional polarization variables for every respondent in the original dataset. This is the case for e.g. the Spread of Like-Dislike scores by Wagner.
With aggregate
, you can choose whether the function puts out the full dataset with the additional individual level variables, or already a country-year aggregation of the measure.
cses <- spread_likedislike(cses, weighted = TRUE, aggregate = FALSE)
agg_spread <- spread_likedislike(cses, weighted = TRUE, aggregate = TRUE)
Other measures are computed directly on a country-year level and do not have an individual level, like below the standard deviation of participants’ left-right self-positioning, the CSES Polarization Index by Dalton or API by Reiljan:
sd_mass <- sd_mass(cses, issue = "leftright")
polarization_index <- cpi(cses)
api_cses <- api(cses)
For yet another kind of measure, such as range()
, you could choose to compute it on the individual level party scores, or on previously aggregated mean scores of party positions. For these cases, the level
option is also available.
range_ind <- range_parties(csesimd, issue = "leftright", aggregate = TRUE, level = "individual")
range_agg <- range_parties(csesimd, issue = "leftright", level = "aggregate")
Weighted Measures
Some measures have weighted variations. With weighted
, you can toggle between the two versions.
spread_wgt <- spread_likedislike(cses, weighted = TRUE, aggregate = TRUE)
spread <- spread_likedislike(cses, weighted = FALSE, aggregate = TRUE)
Expert measures
Expert measures are somewhat of a special case, as they follow a different logic than the rest of the dataset. Expert data is available by party, not by respondent. CSES, a respondent based dataset, would therefore have many duplicate data points for the same party, as they are repeated for each individual respondent. CHES data, on the other hand, uses the different parties as units of observation. For this, we cannot rely on the usual coding pattern using party_A & party_B etc., but need to use a unique function. The function to compute measures on expert data therefore specifies whether the dataset units are respondents (CSES, ESS…) or parties (CHES, MARPOR).
sd_expert <- sd_experts(cses, issue = "leftright", units = "respondent")
sd_expert_galtan_ches <- sd_experts(ches, issue = "galtan", units = "party")
sd_expert_lrecon_ches <- sd_experts(ches, issue = "lrecon", units = "party")
sd_expert_lr_marpor <- sd_experts(marpor, issue = "leftright", units = "party")
Issue Dimension
Measures like the perception of party positions can have different issue dimensions. With issue
, it can be defined which dimension the measure should be computed on.
range_lr <- range_ind(cses, issue = "leftright", aggregate = TRUE)
sd_expert_galtan_ches <- sd_experts(ches, issue = "galtan", units = "party")