Reputation: 3
Using the European Social Survey (ESS) I am trying to clean up my data in RStudio. I used paste() to combine three separate columns, prtvtDK/prtvtNO/prtvtSE (party voted for in most recent election, Denmark/Norway/Sweden respectively) into one, fullprtvt, which has about 38000 individual cases across 8 waves.
What I want to do is assign each individual a factor of either "Populist" or "Nonpopulist" depending on which party they voted for at the last election. In their data-gathering, the party names changed a few waves in so there are multiple names for the same party, but I have a list of the ones used for the populist parties that I list below:
Essentially, I want anyone who voted for one of the above parties to be classified as "Populist" in the new variable. I was thinking that some kind of ifelse() function or if else statement could work, but reading through other posts I saw people suggesting using the switch() function instead. What's the best way to tackle this problem?
Upvotes: 0
Views: 397
Reputation: 704
If you have a list, you can use dplyr::case_when syntax:
library(dplyr)
df <- data.frame(fullprtvt = sample(c('Dansk Folkeparti',
'Dansk Folkeparti - Danish peoples party',
'Sverigedemokraterna',
'Progress Party (FrP)',
'Progress Party (FRP)',
'Moderate party',
'Communist',
'Socialist',
'Democrat'), 10, TRUE))
left<-c(
'Sverigedemokraterna',
'Progress Party (FrP)',
'Progress Party (FRP)',
'Moderate party',
'Communist',
'Socialist')
right<-c(
'Dansk Folkeparti',
'Dansk Folkeparti - Danish peoples party')
df %>%
mutate(fullprtvt=as.character(fullprtvt)) %>%
mutate(affiliation =
case_when(
fullprtvt %in% left ~ "left",
fullprtvt %in% right ~ "right",
TRUE ~ fullprtvt
))
This will produce:
fullprtvt affiliation
1 Dansk Folkeparti right
2 Dansk Folkeparti right
3 Socialist left
4 Dansk Folkeparti - Danish peoples party right
5 Progress Party (FrP) left
6 Progress Party (FrP) left
7 Progress Party (FrP) left
8 Moderate party left
9 Sverigedemokraterna left
10 Dansk Folkeparti - Danish peoples party right
This can be expanded for strings in names of the parties:
# strings
left <- Hmisc::Cs(progres, commun, soc)
centre <- Hmisc::Cs(moderat, demo)
right <- Hmisc::Cs(people, folk)
regexStr <- paste0(c(left,centre, right), collapse = "|")
df %>%
mutate(fullprtvt =
as.character(fullprtvt) )%>%
mutate(shortprtvt=
str_extract(
string = str_to_lower( fullprtvt),
pattern = regexStr)) %>%
mutate(affiliation =
case_when(
shortprtvt %in% left ~ "left",
shortprtvt %in% centre ~ "centre",
shortprtvt %in% right ~ "right",
TRUE ~ fullprtvt
))
We get the result (note that regex and case_when can be tweeked further to detect if two words, out of which one signals left, the other signals, say. agrarian, should not in fact mean not left but centre or right....)
fullprtvt shortprtvt affiliation
1 Dansk Folkeparti - Danish peoples party folk right
2 Democrat demo centre
3 Progress Party (FRP) progres left
4 Progress Party (FRP) progres left
5 Socialist soc left
6 Democrat demo centre
7 Progress Party (FRP) progres left
8 Dansk Folkeparti folk right
9 Sverigedemokraterna demo centre
10 Sverigedemokraterna demo centre
Upvotes: 1
Reputation: 174278
You can use grep
to identify common strings in the populist-labelled parties.
Here, I am assuming your data frame is called df
and that fullprtvt
is a column in that data frame. You'll need to ensure it is a character column. If not, you can alter df$fullprtvt
to as.character(df$fullprtvt)
in the following:
pop <- grep('Dansk Folkeparti|Sverigedemokraterna|Progress Party', df$fullprtvt)
df$pop_nonpop <- rep("Non-populist", nrow(df))
df$pop_nonpop[pop] <- "Populist"
Here's a reproducible example of some dummy data:
set.seed(69)
df <- data.frame(fullprtvt = sample(c('Dansk Folkeparti',
'Dansk Folkeparti - Danish peoples party',
'Sverigedemokraterna',
'Progress Party (FrP)',
'Progress Party (FRP)',
'Moderate party',
'Communist',
'Socialist',
'Democrat'), 10, TRUE))
df
#> fullprtvt
#> 1 Dansk Folkeparti
#> 2 Dansk Folkeparti - Danish peoples party
#> 3 Socialist
#> 4 Communist
#> 5 Communist
#> 6 Moderate party
#> 7 Communist
#> 8 Dansk Folkeparti - Danish peoples party
#> 9 Progress Party (FrP)
#> 10 Dansk Folkeparti - Danish peoples party
And how the above code would work:
pop <- grep('Dansk Folkeparti|Sverigedemokraterna|Progress Party', df$fullprtvt)
df$pop_nonpop <- rep("Non-populist", nrow(df))
df$pop_nonpop[pop] <- "Populist"
df
#> fullprtvt pop_nonpop
#> 1 Dansk Folkeparti Populist
#> 2 Dansk Folkeparti - Danish peoples party Populist
#> 3 Socialist Non-populist
#> 4 Communist Non-populist
#> 5 Communist Non-populist
#> 6 Moderate party Non-populist
#> 7 Communist Non-populist
#> 8 Dansk Folkeparti - Danish peoples party Populist
#> 9 Progress Party (FrP) Populist
#> 10 Dansk Folkeparti - Danish peoples party Populist
Created on 2020-02-20 by the reprex package (v0.3.0)
Upvotes: 3