savasseker
savasseker

Reputation: 29

combining information from different data sets

Countries and continents are in this data set.

df <- read.csv ('https://raw.githubusercontent.com/ulklc/covid19-timeseries/master/countryReport/raw/rawReport.csv')

#This data set contains countries and population information.

df8 <- read.csv ('https://raw.githubusercontent.com/hirenvadher954/Worldometers-Scraping/master/countries.csv')


library(dplyr)
library(stringr

df %>% 
    left_join(df8, by = c("countryName" = "country_name")) %>% 
    mutate(population = as.numeric(str_remove_all(population, ","))) %>% 
    group_by(countryName) %>%
    slice_tail(1) %>%
    group_by(region) %>% 
    summarize(population = sum(population, na.rm = TRUE)) 

df%>% left_join (df8, by = c (countryName = "country_name"))%>% error: No function "%>%" found gives this error. Can you explain why and provide a solution?

How can I combine continental information in data set 1 with population information in data set 2?

for example: asia 2.8 billion, africa 800 million, europe 1 billion

Upvotes: 0

Views: 67

Answers (1)

Peter
Peter

Reputation: 12739

You've got a couple of issues going on here:

1) countries are treated as factors when you read the data using read.csv; you can resolve this with the argument stringsAsFactors = FALSE

2) slice_tail not sure where this comes from; is dplyr::slice what you are looking for?


df <- read.csv ('https://raw.githubusercontent.com/ulklc/covid19-timeseries/master/countryReport/raw/rawReport.csv',
                stringsAsFactors = FALSE)

#This data set contains countries and population information.

df8 <- read.csv ('https://raw.githubusercontent.com/hirenvadher954/Worldometers-Scraping/master/countries.csv',
                 stringsAsFactors = FALSE)


library(dplyr) 
library(stringr

        df %>% 
          left_join(df8, by = c("countryName" = "country_name")) %>% 
          mutate(population = as.numeric(str_remove_all(population, ","))) %>% 
          group_by(countryName) %>%
          slice(1) %>%
          group_by(region) %>% 
          summarize(population = sum(population, na.rm = TRUE)) 

This gives you:

df
## # A tibble: 5 x 2
##   region   population
##   <chr>         <dbl>
## 1 Africa   1304908713
## 2 Americas 1019607512
## 3 Asia     4592311527
## 4 Europe    738083720
## 5 Oceania    40731992

Upvotes: 1

Related Questions