Reputation: 55
I have a large csv, and there are two columns titled derived_race and derived_ethnicity. I am trying to put them into one column titled Race. Within the derived_race column there are multiple races including 'White', 'Black or African American', 'Asian', 'Joint', etc. Within the derived_ethnicity column there is just 'Hispanic or Latino' and 'Not Hispanic or Latino'. In the new column that I am creating, I am only want 4 categories - White, Black, Hispanic, and Other.
For white: the derived_Race should be 'White' and the derived ethnicity should be 'Not Hispanic or Latino' For black: the derived_race should be 'Black or African American' and the derived_ethnicity should be 'Not Hispanic or Latino' For Hispanic: the derived_ethnicity should be 'Hispanic or Latino' Other should be everything else
The current code that I tried to use is:
mutate(Race = ifelse(derived_race == 'Black or African American', derived_ethnicity = 'Not Hispanic or Latino', 'Black', ifelse(derived_race == 'White', derived_ethnicity == 'Not Hispanic or Latino', 'White', ifelse(derived_ethnicity == 'Hispanic or Latino', 'Hispanic', 'Other'))))
I think that I am using the and statements wrong. Thanks in advance for any help!
Upvotes: 0
Views: 92
Reputation: 887048
If the OP meant ==
instead of =
, the "and" symbol is &
library(dplyr)
df1 %>%
mutate(Race = ifelse(derived_race == 'Black or African American' &
derived_ethnicity == 'Not Hispanic or Latino', 'Black',
ifelse(derived_race == 'White' & derived_ethnicity ==
'Not Hispanic or Latino', 'White',
ifelse(derived_ethnicity == 'Hispanic or Latino', 'Hispanic', 'Other'))))
Or instead of a nested ifelse
, we can use case_when
df1 %>%
mutate(Race = case_when(derived_race == 'Black or African American' &
derived_ethnicity == 'Not Hispanic or Latino' ~ 'Black,
derived_race == 'White' & derived_ethnicity ==
'Not Hispanic or Latino' ~ 'White',
derived_ethnicity == 'Hispanic or Latino' ~ 'Hispanic', TRUE ~ 'Other'))
Upvotes: 2