Reputation: 11
I'm trying to build an if function that allows me to mutate the "city" column of a dataframe with a certain city name if in the "zipcode" column the value starts with a certain number.
For example: If zipcode starts with 1, mutate city column value with "NYC", else if zipcode starts with 6, mutate city column value with "Chicago", else if zipcode starts with 2, mutate city column value with "Boston",
and so on.
From:
city zipcode
NYC 11211
DC 20910
NYC 11104
NA 11106
NA 2008
NA 60614
To:
city zipcode
NYC 11211
DC 20910
NYC 11104
NYC 11106
DC 2008
Chicago 60614
It's a way to deal with NA values: The if function would just rewrite the same city for the values in which they are already present, and type the city name in case there's an NA value
Dataframe name data.frame
Column name zipcode
and city
.
Both of them are factor type and have to remain such for my further models.
I want do directly mutate the dataframe as I will need it for further use.
PS: Sorry for bad writing. I'm new in the community.
Thanks in advance!
Upvotes: 1
Views: 2476
Reputation: 116
Here's a solution that might work for you.
Full code:
# load library
library(tidyverse)
# create the sample dataframe
df <- tribble(~city, ~zipcode,
'NYC',11211,
'DC',20910,
'NYC', 11104,
NA, 11106,
NA, 2008,
NA, 60614)
# change the NAs to the appropriate values
df <- df %>%
mutate(
city = case_when(
str_sub(zipcode, 1, 1) == '1' ~ 'NYC',
str_sub(zipcode, 1, 1) == '2' ~ 'DC',
str_sub(zipcode, 1, 1) == '6' ~ 'Chicago',
TRUE ~ city
)
)
# convert everything to factors
df <- df %>%
mutate(
city = as.factor(city),
zipcode = as.factor(zipcode)
)
#preview the output
glimpse(df)
The output of the glimpse() is:
Observations: 6
Variables: 2
$ city <fct> NYC, DC, NYC, NYC, DC, Chicago
$ zipcode <fct> 11211, 20910, 11104, 11106, 2008, 60614
The trick that I used was first keep everything as a string or number, fill in the missing values, and then convert to factor.
Upvotes: 1