How can I assign a factor in a new variable based on whether a given string is contained in another variable (R)?

Question

Using the European Social Survey (ESS) I am trying to clean up my data in RStudio. I used paste() to combine three separate columns, prtvtDK/prtvtNO/prtvtSE (party voted for in most recent election, Denmark/Norway/Sweden respectively) into one, fullprtvt, which has about 38000 individual cases across 8 waves.

What I want to do is assign each individual a factor of either "Populist" or "Nonpopulist" depending on which party they voted for at the last election. In their data-gathering, the party names changed a few waves in so there are multiple names for the same party, but I have a list of the ones used for the populist parties that I list below:

'Dansk Folkeparti'
'Dansk Folkeparti - Danish peoples party'
'Sverigedemokraterna'
'Progress Party (FrP)'
'Progress Party (FRP)'

Essentially, I want anyone who voted for one of the above parties to be classified as "Populist" in the new variable. I was thinking that some kind of ifelse() function or if else statement could work, but reading through other posts I saw people suggesting using the switch() function instead. What's the best way to tackle this problem?

Allan Cameron · Accepted Answer

You can use grep to identify common strings in the populist-labelled parties.

Here, I am assuming your data frame is called df and that fullprtvt is a column in that data frame. You'll need to ensure it is a character column. If not, you can alter df$fullprtvt to as.character(df$fullprtvt) in the following:

pop <- grep('Dansk Folkeparti|Sverigedemokraterna|Progress Party', df$fullprtvt)
df$pop_nonpop <- rep("Non-populist", nrow(df))
df$pop_nonpop[pop] <- "Populist"

Here's a reproducible example of some dummy data:

set.seed(69)
df <- data.frame(fullprtvt = sample(c('Dansk Folkeparti',
                                      'Dansk Folkeparti - Danish peoples party',
                                      'Sverigedemokraterna',
                                      'Progress Party (FrP)',
                                      'Progress Party (FRP)',
                                      'Moderate party',
                                      'Communist',
                                      'Socialist',
                                      'Democrat'), 10, TRUE))

df
#>                                  fullprtvt
#> 1                         Dansk Folkeparti
#> 2  Dansk Folkeparti - Danish peoples party
#> 3                                Socialist
#> 4                                Communist
#> 5                                Communist
#> 6                           Moderate party
#> 7                                Communist
#> 8  Dansk Folkeparti - Danish peoples party
#> 9                     Progress Party (FrP)
#> 10 Dansk Folkeparti - Danish peoples party

And how the above code would work:

pop <- grep('Dansk Folkeparti|Sverigedemokraterna|Progress Party', df$fullprtvt)
df$pop_nonpop <- rep("Non-populist", nrow(df))
df$pop_nonpop[pop] <- "Populist"

df
#>                                  fullprtvt   pop_nonpop
#> 1                         Dansk Folkeparti     Populist
#> 2  Dansk Folkeparti - Danish peoples party     Populist
#> 3                                Socialist Non-populist
#> 4                                Communist Non-populist
#> 5                                Communist Non-populist
#> 6                           Moderate party Non-populist
#> 7                                Communist Non-populist
#> 8  Dansk Folkeparti - Danish peoples party     Populist
#> 9                     Progress Party (FrP)     Populist
#> 10 Dansk Folkeparti - Danish peoples party     Populist

^{Created on 2020-02-20 by the reprex package (v0.3.0)}

How can I assign a factor in a new variable based on whether a given string is contained in another variable (R)?

Answers (2)

Related Questions