Activation
Activation

Reputation: 93

Aggregate data in dataframe by first transforming values in column

I have a data set with import and export numbers from countries which looks basically like this:

Country_from   Country_to    Count    Value
   UK             USA          5        10
   France         Belgium      4        7
   USA            UK           1        6
   Belgium        France       8        9

Now, I want to aggregate this data and to combine the import and export numbers by summation. So, I want my resulting dataframe to be:

 Country_from   Country_to    Count    Value
   UK             USA          6        16
   France         Belgium      12       16

I made a script which concates the to and from countries and then sorts the characters alphabetically to check whether, for example, UK - USA and USA-UK are the same and then aggregates the values.

This sorting part of my code looks like the following:

#concatenate to and from country name
country_from = data.frame(lapply(data_lines$Country_from, as.character), stringsAsFactors=FALSE)
country_to = data.frame(lapply(data_lines$Country_to, as.character), stringsAsFactors=FALSE)
concat_names = as.matrix(paste(country_from, country_to, " "))

#order characters alphabetically
strSort <- function(x)
 sapply(lapply(strsplit(x, NULL), sort), paste, collapse="")

sorted = strSort(concat_names)

This approach works in this specific case, but it could theoretically be the case that two different countries have the same alphabetically sorted characters.

If there is a Country_from-Country_to combination without the same reverse, then it should save the values as they are given (so do nothing).

Does anyone have an idea how to do this without using the alphabetically sorted characters?

Upvotes: 1

Views: 48

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389175

One way using dplyr would be to create a rowwise grouping variable by sorting and pasting Country_from and Country_to and then take sum by that group.

library(dplyr)

df %>%
  rowwise() %>%
  mutate(country = paste(sort(c(Country_from, Country_to)), collapse = "-")) %>%
  ungroup() %>%
  group_by(country) %>%
  summarise_at(vars(Count:Value), funs(sum))

#   country        Count Value
#  <chr>          <int> <int>
#1 Belgium-France    12    16
#2 UK-USA             6    16

Here, instead of sorting the characters we are sorting the words.

Upvotes: 1

Related Questions