Reputation: 93
I have a data set with import and export numbers from countries which looks basically like this:
Country_from Country_to Count Value
UK USA 5 10
France Belgium 4 7
USA UK 1 6
Belgium France 8 9
Now, I want to aggregate this data and to combine the import and export numbers by summation. So, I want my resulting dataframe to be:
Country_from Country_to Count Value
UK USA 6 16
France Belgium 12 16
I made a script which concates the to
and from
countries and then sorts the characters alphabetically to check whether, for example, UK - USA and USA-UK are the same and then aggregates the values.
This sorting part of my code looks like the following:
#concatenate to and from country name
country_from = data.frame(lapply(data_lines$Country_from, as.character), stringsAsFactors=FALSE)
country_to = data.frame(lapply(data_lines$Country_to, as.character), stringsAsFactors=FALSE)
concat_names = as.matrix(paste(country_from, country_to, " "))
#order characters alphabetically
strSort <- function(x)
sapply(lapply(strsplit(x, NULL), sort), paste, collapse="")
sorted = strSort(concat_names)
This approach works in this specific case, but it could theoretically be the case that two different countries have the same alphabetically sorted characters.
If there is a Country_from
-Country_to
combination without the same reverse, then it should save the values as they are given (so do nothing).
Does anyone have an idea how to do this without using the alphabetically sorted characters?
Upvotes: 1
Views: 48
Reputation: 389175
One way using dplyr
would be to create a rowwise
grouping variable by sorting and pasting Country_from
and Country_to
and then take sum
by that group.
library(dplyr)
df %>%
rowwise() %>%
mutate(country = paste(sort(c(Country_from, Country_to)), collapse = "-")) %>%
ungroup() %>%
group_by(country) %>%
summarise_at(vars(Count:Value), funs(sum))
# country Count Value
# <chr> <int> <int>
#1 Belgium-France 12 16
#2 UK-USA 6 16
Here, instead of sorting the characters we are sorting the words.
Upvotes: 1