Reputation: 43
df <- read.csv ('https://raw.githubusercontent.com/ulklc/covid19-timeseries/master/countryReport/raw/rawReport.csv')
df8 <- read.csv ('https://raw.githubusercontent.com/hirenvadher954/Worldometers-Scraping/master/countries.csv')
In the 1st dataset, there are countries divided into continents.
In the second data set, there is country and population information.
How can I combine population information in data set 2 according to the continental information in data set 1.
thank you. The problem is that in the 1st dataset, countries are written on a continental basis. Countries and their populations in the second dataset. Do I need the population information of the continents? eg europe = 400 million, asia = 2.4 billion
Upvotes: 0
Views: 86
Reputation: 368
Using the dplyr package all you have to do is join by a common variable, in this case country name. Since in one data frame the name is called countryName
and in the other one country_name
, we just have to specify that they in fact belong to the same variable.
library(dplyr)
library(stringr)
df %>%
left_join(df8, by = c("countryName" = "country_name")) %>%
mutate(population = as.numeric(str_remove_all(population, ","))) %>%
group_by(countryName) %>%
slice_tail(1) %>%
group_by(region) %>%
summarize(population = sum(population, na.rm = TRUE))
# A tibble: 5 x 2
region population
* <chr> <dbl>
1 Africa 1304908713
2 Americas 1019607512
3 Asia 4592311527
4 Europe 738083720
5 Oceania 40731992
Upvotes: 1