Reputation: 1031
I have two dataframes in R:
city price bedroom
San Jose 2000 1
Barstow 1000 1
NA 1500 1
Code to recreate:
data = data.frame(city = c('San Jose', 'Barstow'), price = c(2000,1000, 1500), bedroom = c(1,1,1))
and:
Name Density
San Jose 5358
Barstow 547
Code to recreate:
population_density = data.frame(Name=c('San Jose', 'Barstow'), Density=c(5358, 547));
I want to create an additional column named city_type
in the data
dataset based on condition, so if the city population density is above 1000, it's an urban, lower than 1000 is a suburb, and NA is NA.
city price bedroom city_type
San Jose 2000 1 Urban
Barstow 1000 1 Suburb
NA 1500 1 NA
I am using a for loop for conditional flow:
for (row in 1:length(data)) {
if (is.na(data[row,'city'])) {
data[row, 'city_type'] = NA
} else if (population[population$Name == data[row,'city'],]$Density>=1000) {
data[row, 'city_type'] = 'Urban'
} else {
data[row, 'city_type'] = 'Suburb'
}
}
The for loop runs with no error in my original dataset with over 20000 observations; however, it yields a lot of wrong results (it yields NA for the most part).
What has gone wrong here and how can I do better to achieve my desired result?
Upvotes: 3
Views: 200
Reputation: 2289
I have become quite a fan of dplyr
pipelines for this type of join/filter/mutate workflow. So here is my suggestion:
library(dplyr)
# I had to add that extra "NA" there, did you not? Hm...
data <- data.frame(city = c('San Jose', 'Barstow', NA), price = c(2000,1000, 500), bedroom = c(1,1,1))
population <- data.frame(Name=c('San Jose', 'Barstow'), Density=c(5358, 547));
data %>%
# join the two dataframes by matching up the city name columns
left_join(population, by = c("city" = "Name")) %>%
# add your new column based on the desired condition
mutate(
city_type = ifelse(Density >= 1000, "Urban", "Suburb")
)
Output:
city price bedroom Density city_type
1 San Jose 2000 1 5358 Urban
2 Barstow 1000 1 547 Suburb
3 <NA> 500 1 NA <NA>
Upvotes: 4
Reputation: 323376
Using ifelse
create the city_type
in population_density
, then we using match
population_density$city_type=ifelse(population_density$Density>1000,'Urban','Suburb')
data$city_type=population_density$city_type[match(data$city,population_density$Name)]
data
city price bedroom city_type
1 San Jose 2000 1 Urban
2 Barstow 1000 1 Suburb
3 <NA> 1500 1 <NA>
Upvotes: 2