David Potrel
David Potrel

Reputation: 111

R - adding a variable from another dataset with different # rows

I'm currently working on R on a survey on schools and I would like to add a variable with the population of the city the school is in.

In the first data set I have all the survey respondants which includes a variable "city_name". I have managed to find online a list of the cities with their population which I have imported on R.

What I now would like to do is to add a variable in dataset_1 called city_pop which is equal to the city population when city_name is in both data sets. It might be relevant to know that the first dataset has around 1200 rows while the second one has around 36000 rows.

I've tried several things including the following:

data_set_1$Pop_city = ifelse(data_set_1$city_name == data_set_2$city_name, data_set_2$Pop_city, 0)

Any clues?

Thanks!!

Upvotes: 0

Views: 1278

Answers (2)

Claudio
Claudio

Reputation: 1528

You need to merge the two dataset:

new_df <- merge(data_set_1, data_set_2, by="city_name")

The result will be a dataframe containing only matching rows (in your case, 1200 rows assuming that all cities in data_set_1 are also in data_set_2) and all columns of both data frames. If you want to also keep non-matching rows of data_set_1, you can use the all.x option:

new_df <- merge(data_set_1, data_set_2, by="city_name", all.x=TRUE)

Upvotes: 2

Martin Gal
Martin Gal

Reputation: 16978

Two ways you could try using dplyr:

library(dplyr)

data_set_1 %>% 
  mutate(Pop_city = ifelse(city_name %in% data_set_2$city_name, 
                           data_set_2$Pop_city[city_name == data_set_2$city_name],
                           0))

or using a left_join

data_set_1 %>% 
  left_join(data_set_2, by = "city_name")

perhaps followed by a select(names(data_set_1), Pop_city).

Upvotes: 1

Related Questions