Reputation: 601
Say I have some network data as shown below:
col_a <- c("A","B","C")
col_b <- c("B","A","A")
val <- c(1,3,7)
df <- data.frame(col_a, col_b, val)
df
col_a col_b val
1 A B 1
2 B A 3
3 C A 7
This could be a network and val could be the weight of the edges between the two. However, I want to add the weight between A and B and B and A to get the following:
new_col_a <- c("A", "A")
new_col_b <- c("B", "C")
new_val <- c(4,7)
want_df <- data.frame(new_col_a, new_col_b, new_val)
want_df
new_col_a new_col_b new_val
1 A B 4
2 A C 7
Is there a way to do this in dplyr
?
Upvotes: 3
Views: 80
Reputation: 93938
If you make your data into a tidy, long form first, then it becomes quite a bit simpler. Convert to long, sort your column labels independent of your val
ues, group, sum your val
:
df %>%
gather(grp,col,-val) %>%
mutate(col=col[order(col,grp)]) %>%
spread(grp,col) %>%
group_by(col_a, col_b) %>%
summarize(val = sum(val))
## A tibble: 2 x 3
## Groups: col_a [?]
# col_a col_b val
# <chr> <chr> <dbl>
#1 A B 4
#2 A C 7
Upvotes: 0
Reputation: 3183
You could use dplyr
for this
df <- data.frame(col_a, col_b, val, stringsAsFactors = F)
library(dplyr)
library(tidyr)
df %>%
mutate(
pair = purrr::pmap_chr(
.l = list(from = col_a, to = col_b),
.f = function(from, to) paste(sort(c(from, to)), collapse = "_")
)
) %>%
group_by(pair) %>%
summarise(new_val = sum(val)) %>%
separate(pair, c("new_col_a", "new_col_b"), sep = "_")
# A tibble: 2 x 3
new_col_a new_col_b new_val
<chr> <chr> <dbl>
1 A B 4
2 A C 7
Similar to one of my earlier answers
Upvotes: 2
Reputation: 40171
One dplyr
possibility could be:
df %>%
mutate_if(is.factor, as.character) %>%
group_by(grp = paste(pmin(col_a, col_b), pmax(col_a, col_b), sep = "_")) %>%
summarise(val = sum(val))
grp val
<chr> <dbl>
1 A_B 4
2 A_C 7
Or with tidyverse
, using a similar similar idea as @Sonny:
df %>%
mutate_if(is.factor, as.character) %>%
nest(col_a, col_b) %>%
group_by(grp = unlist(map(data, function(x) paste(sort(x), collapse = "_")))) %>%
summarise(val = sum(val))
If you want to also separate it into two columns (this step will also require tidyr
):
df %>%
mutate_if(is.factor, as.character) %>%
group_by(grp = paste(pmin(col_a, col_b), pmax(col_a, col_b), sep = "_")) %>%
summarise(val = sum(val)) %>%
separate(grp, c("new_col_a", "new_col_b"), sep = "_")
new_col_a new_col_b val
<chr> <chr> <dbl>
1 A B 4
2 A C 7
Or in the case of second possibility:
df %>%
mutate_if(is.factor, as.character) %>%
nest(col_a, col_b) %>%
group_by(grp = unlist(map(data, function(x) paste(sort(x), collapse = "_")))) %>%
summarise(val = sum(val)) %>%
separate(grp, c("new_col_a", "new_col_b"), sep = "_")
Upvotes: 3