Subtracting only specific rows based on name value in R

Question

I have a dataframe with two columns, region and number of user. Region is sometimes combination of countries and continent (see below the dummy sample). Ex: user number for Oceania (New Zeland included) and New Zeland. I am looking for a way to subtract country number from continent row, in other word having two rows without duplication, Oceania(without New Zeland) and New Zeland. Same for Europe and Germany, etc

Here the sample data

df = data.frame(
  regions_user = c(2200, 1250, 900, 21200, 14200, 150000, 35000, 41900),
  regions_names = c(
    "Oceania",
    "New Zeland",
    "Africa",
    "North America",
    "Asia",
    "Europe",
    "Germany",
    "Latin America"
  ))

#sample data output

  regions_user regions_names
1         2200       Oceania
2         1250    New Zeland
3          900        Africa
4        21200 North America
5        14200          Asia
6       150000        Europe
7        35000       Germany
8        41900 Latin America

#desired output

regions_user regions_names
1          950       Oceania
2         1250    New Zeland
3          900        Africa
4        21200 North America
5        14200          Asia
6       115000        Europe
7        35000       Germany
8        41900 Latin America

Ronak Shah · Accepted Answer

Provided you have or create such a mapping_list :

library(dplyr)
mapping_list <- data.frame(country = c('New Zeland', 'Germany'), 
                           continent = c('Oceania', 'Europe'))

mapping_list
#     country continent
#1 New Zeland   Oceania
#2    Germany    Europe

You can perform a join and subtract the value :

df %>%
  left_join(mapping_list, by = c('regions_names' = 'country')) %>%
  group_by(continent) %>%
  summarise(subtract = sum(regions_user)) %>%
  right_join(df, by = c('continent' = 'regions_names')) %>%
  transmute(regions_names = continent, 
            regions_user = ifelse(is.na(subtract), regions_user, regions_user - subtract))

#  regions_names regions_user
#                  
#1 Europe              115000
#2 Oceania                950
#3 New Zeland            1250
#4 Africa                 900
#5 North America        21200
#6 Asia                 14200
#7 Germany              35000
#8 Latin America        41900

Subtracting only specific rows based on name value in R

Answers (2)

Related Questions