Old Sol
Old Sol

Reputation: 3

How can I assign a factor in a new variable based on whether a given string is contained in another variable (R)?

Using the European Social Survey (ESS) I am trying to clean up my data in RStudio. I used paste() to combine three separate columns, prtvtDK/prtvtNO/prtvtSE (party voted for in most recent election, Denmark/Norway/Sweden respectively) into one, fullprtvt, which has about 38000 individual cases across 8 waves.

What I want to do is assign each individual a factor of either "Populist" or "Nonpopulist" depending on which party they voted for at the last election. In their data-gathering, the party names changed a few waves in so there are multiple names for the same party, but I have a list of the ones used for the populist parties that I list below:

Essentially, I want anyone who voted for one of the above parties to be classified as "Populist" in the new variable. I was thinking that some kind of ifelse() function or if else statement could work, but reading through other posts I saw people suggesting using the switch() function instead. What's the best way to tackle this problem?

Upvotes: 0

Views: 397

Answers (2)

Jacek Kotowski
Jacek Kotowski

Reputation: 704

If you have a list, you can use dplyr::case_when syntax:

library(dplyr)

df <- data.frame(fullprtvt = sample(c('Dansk Folkeparti',
                                      'Dansk Folkeparti - Danish peoples party',
                                      'Sverigedemokraterna',
                                      'Progress Party (FrP)',
                                      'Progress Party (FRP)',
                                      'Moderate party',
                                      'Communist',
                                      'Socialist',
                                      'Democrat'), 10, TRUE))

left<-c(
  'Sverigedemokraterna',
  'Progress Party (FrP)',
  'Progress Party (FRP)', 
  'Moderate party',
  'Communist',
  'Socialist')
right<-c(
  'Dansk Folkeparti',
  'Dansk Folkeparti - Danish peoples party')


df %>%
  mutate(fullprtvt=as.character(fullprtvt)) %>% 
  mutate(affiliation =
           case_when(
             fullprtvt %in% left ~ "left",
             fullprtvt %in% right ~ "right",
             TRUE ~ fullprtvt
           ))

This will produce:

                                 fullprtvt affiliation
1                         Dansk Folkeparti       right
2                         Dansk Folkeparti       right
3                                Socialist        left
4  Dansk Folkeparti - Danish peoples party       right
5                     Progress Party (FrP)        left
6                     Progress Party (FrP)        left
7                     Progress Party (FrP)        left
8                           Moderate party        left
9                      Sverigedemokraterna        left
10 Dansk Folkeparti - Danish peoples party       right

This can be expanded for strings in names of the parties:

# strings

    left   <- Hmisc::Cs(progres, commun,  soc)
    centre <- Hmisc::Cs(moderat, demo)
    right  <- Hmisc::Cs(people, folk)

regexStr <- paste0(c(left,centre, right), collapse = "|")

df %>%
  mutate(fullprtvt = 
           as.character(fullprtvt) )%>% 
  mutate(shortprtvt= 
           str_extract(
             string = str_to_lower( fullprtvt), 
             pattern = regexStr)) %>% 
  mutate(affiliation =
           case_when(
             shortprtvt %in%   left ~ "left",
             shortprtvt %in% centre ~ "centre",
             shortprtvt %in%  right ~ "right",
                               TRUE ~  fullprtvt
           ))

We get the result (note that regex and case_when can be tweeked further to detect if two words, out of which one signals left, the other signals, say. agrarian, should not in fact mean not left but centre or right....)

                                 fullprtvt shortprtvt affiliation
1  Dansk Folkeparti - Danish peoples party       folk       right
2                                 Democrat       demo      centre
3                     Progress Party (FRP)    progres        left
4                     Progress Party (FRP)    progres        left
5                                Socialist        soc        left
6                                 Democrat       demo      centre
7                     Progress Party (FRP)    progres        left
8                         Dansk Folkeparti       folk       right
9                      Sverigedemokraterna       demo      centre
10                     Sverigedemokraterna       demo      centre

Upvotes: 1

Allan Cameron
Allan Cameron

Reputation: 174278

You can use grep to identify common strings in the populist-labelled parties.

Here, I am assuming your data frame is called df and that fullprtvt is a column in that data frame. You'll need to ensure it is a character column. If not, you can alter df$fullprtvt to as.character(df$fullprtvt) in the following:

pop <- grep('Dansk Folkeparti|Sverigedemokraterna|Progress Party', df$fullprtvt)
df$pop_nonpop <- rep("Non-populist", nrow(df))
df$pop_nonpop[pop] <- "Populist"

Here's a reproducible example of some dummy data:

set.seed(69)
df <- data.frame(fullprtvt = sample(c('Dansk Folkeparti',
                                      'Dansk Folkeparti - Danish peoples party',
                                      'Sverigedemokraterna',
                                      'Progress Party (FrP)',
                                      'Progress Party (FRP)',
                                      'Moderate party',
                                      'Communist',
                                      'Socialist',
                                      'Democrat'), 10, TRUE))

df
#>                                  fullprtvt
#> 1                         Dansk Folkeparti
#> 2  Dansk Folkeparti - Danish peoples party
#> 3                                Socialist
#> 4                                Communist
#> 5                                Communist
#> 6                           Moderate party
#> 7                                Communist
#> 8  Dansk Folkeparti - Danish peoples party
#> 9                     Progress Party (FrP)
#> 10 Dansk Folkeparti - Danish peoples party

And how the above code would work:

pop <- grep('Dansk Folkeparti|Sverigedemokraterna|Progress Party', df$fullprtvt)
df$pop_nonpop <- rep("Non-populist", nrow(df))
df$pop_nonpop[pop] <- "Populist"

df
#>                                  fullprtvt   pop_nonpop
#> 1                         Dansk Folkeparti     Populist
#> 2  Dansk Folkeparti - Danish peoples party     Populist
#> 3                                Socialist Non-populist
#> 4                                Communist Non-populist
#> 5                                Communist Non-populist
#> 6                           Moderate party Non-populist
#> 7                                Communist Non-populist
#> 8  Dansk Folkeparti - Danish peoples party     Populist
#> 9                     Progress Party (FrP)     Populist
#> 10 Dansk Folkeparti - Danish peoples party     Populist

Created on 2020-02-20 by the reprex package (v0.3.0)

Upvotes: 3

Related Questions