Reputation: 741
I have a dataframe with two columns, region and number of user. Region is sometimes combination of countries and continent (see below the dummy sample). Ex: user number for Oceania (New Zeland included) and New Zeland. I am looking for a way to subtract country number from continent row, in other word having two rows without duplication, Oceania(without New Zeland) and New Zeland. Same for Europe and Germany, etc
Here the sample data
df = data.frame(
regions_user = c(2200, 1250, 900, 21200, 14200, 150000, 35000, 41900),
regions_names = c(
"Oceania",
"New Zeland",
"Africa",
"North America",
"Asia",
"Europe",
"Germany",
"Latin America"
))
#sample data output
regions_user regions_names
1 2200 Oceania
2 1250 New Zeland
3 900 Africa
4 21200 North America
5 14200 Asia
6 150000 Europe
7 35000 Germany
8 41900 Latin America
#desired output
regions_user regions_names
1 950 Oceania
2 1250 New Zeland
3 900 Africa
4 21200 North America
5 14200 Asia
6 115000 Europe
7 35000 Germany
8 41900 Latin America
Upvotes: 1
Views: 1011
Reputation: 78937
In case of, country is always in the row below the region as in your example dataset, you can use
case_when
to check if the vector regions
is in the dataframe, then substract the row below with lag
.
library(dplyr)
regions <- c("Oceania", "Europe")
df %>%
mutate(regions_user = case_when(
regions_names %in% regions ~ regions_user - lead(regions_user),
TRUE ~ regions_user))
# Output
regions_user regions_names
1 950 Oceania
2 1250 New Zeland
3 900 Africa
4 21200 North America
5 14200 Asia
6 115000 Europe
7 35000 Germany
8 41900 Latin America
Upvotes: 2
Reputation: 388992
Provided you have or create such a mapping_list
:
library(dplyr)
mapping_list <- data.frame(country = c('New Zeland', 'Germany'),
continent = c('Oceania', 'Europe'))
mapping_list
# country continent
#1 New Zeland Oceania
#2 Germany Europe
You can perform a join and subtract the value :
df %>%
left_join(mapping_list, by = c('regions_names' = 'country')) %>%
group_by(continent) %>%
summarise(subtract = sum(regions_user)) %>%
right_join(df, by = c('continent' = 'regions_names')) %>%
transmute(regions_names = continent,
regions_user = ifelse(is.na(subtract), regions_user, regions_user - subtract))
# regions_names regions_user
# <chr> <dbl>
#1 Europe 115000
#2 Oceania 950
#3 New Zeland 1250
#4 Africa 900
#5 North America 21200
#6 Asia 14200
#7 Germany 35000
#8 Latin America 41900
Upvotes: 1