DanG
DanG

Reputation: 741

Subtracting only specific rows based on name value in R

I have a dataframe with two columns, region and number of user. Region is sometimes combination of countries and continent (see below the dummy sample). Ex: user number for Oceania (New Zeland included) and New Zeland. I am looking for a way to subtract country number from continent row, in other word having two rows without duplication, Oceania(without New Zeland) and New Zeland. Same for Europe and Germany, etc

Here the sample data

df = data.frame(
  regions_user = c(2200, 1250, 900, 21200, 14200, 150000, 35000, 41900),
  regions_names = c(
    "Oceania",
    "New Zeland",
    "Africa",
    "North America",
    "Asia",
    "Europe",
    "Germany",
    "Latin America"
  ))

#sample data output

  regions_user regions_names
1         2200       Oceania
2         1250    New Zeland
3          900        Africa
4        21200 North America
5        14200          Asia
6       150000        Europe
7        35000       Germany
8        41900 Latin America

#desired output

regions_user regions_names
1          950       Oceania
2         1250    New Zeland
3          900        Africa
4        21200 North America
5        14200          Asia
6       115000        Europe
7        35000       Germany
8        41900 Latin America

Upvotes: 1

Views: 1011

Answers (2)

TarJae
TarJae

Reputation: 78937

In case of, country is always in the row below the region as in your example dataset, you can use case_when to check if the vector regions is in the dataframe, then substract the row below with lag.

library(dplyr)

regions <- c("Oceania", "Europe")

df %>% 
  mutate(regions_user = case_when(
    regions_names %in% regions ~ regions_user - lead(regions_user),
    TRUE ~ regions_user))

# Output
  regions_user regions_names
1          950       Oceania
2         1250    New Zeland
3          900        Africa
4        21200 North America
5        14200          Asia
6       115000        Europe
7        35000       Germany
8        41900 Latin America

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 388992

Provided you have or create such a mapping_list :

library(dplyr)
mapping_list <- data.frame(country = c('New Zeland', 'Germany'), 
                           continent = c('Oceania', 'Europe'))

mapping_list
#     country continent
#1 New Zeland   Oceania
#2    Germany    Europe

You can perform a join and subtract the value :

df %>%
  left_join(mapping_list, by = c('regions_names' = 'country')) %>%
  group_by(continent) %>%
  summarise(subtract = sum(regions_user)) %>%
  right_join(df, by = c('continent' = 'regions_names')) %>%
  transmute(regions_names = continent, 
            regions_user = ifelse(is.na(subtract), regions_user, regions_user - subtract))

#  regions_names regions_user
#  <chr>                <dbl>
#1 Europe              115000
#2 Oceania                950
#3 New Zeland            1250
#4 Africa                 900
#5 North America        21200
#6 Asia                 14200
#7 Germany              35000
#8 Latin America        41900

Upvotes: 1

Related Questions