vahit ünsal
vahit ünsal

Reputation: 43

Searching and using databases

df <- read.csv ('https://raw.githubusercontent.com/ulklc/covid19-timeseries/master/countryReport/raw/rawReport.csv')

df8 <- read.csv ('https://raw.githubusercontent.com/hirenvadher954/Worldometers-Scraping/master/countries.csv')

In the 1st dataset, there are countries divided into continents.

In the second data set, there is country and population information.

How can I combine population information in data set 2 according to the continental information in data set 1.

thank you. The problem is that in the 1st dataset, countries are written on a continental basis. Countries and their populations in the second dataset. Do I need the population information of the continents? eg europe = 400 million, asia = 2.4 billion

Upvotes: 0

Views: 86

Answers (1)

Sergio Romero
Sergio Romero

Reputation: 368

Using the dplyr package all you have to do is join by a common variable, in this case country name. Since in one data frame the name is called countryName and in the other one country_name, we just have to specify that they in fact belong to the same variable.

library(dplyr)
library(stringr)

df %>% 
    left_join(df8, by = c("countryName" = "country_name")) %>% 
    mutate(population = as.numeric(str_remove_all(population, ","))) %>% 
    group_by(countryName) %>%
    slice_tail(1) %>% 
    group_by(region) %>% 
    summarize(population = sum(population, na.rm = TRUE))

# A tibble: 5 x 2
  region   population
* <chr>         <dbl>
1 Africa   1304908713
2 Americas 1019607512
3 Asia     4592311527
4 Europe    738083720
5 Oceania    40731992

Upvotes: 1

Related Questions