Reputation: 1201
My goal is to create an igraph graph object that I can later use to plot with ggraph.
My tidy data are invoices that each include a different number of items. n is the number of occurrences of exactly one invoice in the original sample. For example in the following invoice type 1 that includes bread, butter and eggs, was invoiced 10 times.
#library(tidyverse)
data <- tibble(invoicetype = c(1,1,1,2,2,3,3,4,4,4,4,4,5,5,6,7,7,8,8,8,9,9),
item = c("bread", "butter", "eggs", "bread", "coke", "coke", "eggs",
"bread", "butter","coke", "pasta", "water", "coke", "water",
"coke", "bread", "butter", "eggs", "coke", "water", "pasta",
"bread"),
n = c(10,10,10,8,8,7,7,4,4,4,4,4,3,3,3,2,2,1,1,1,1,1))
I want to create an igraph object that takes into account how many times each item was combined on the same invoice with any other item.
Question: is there a simple way to do this?
My cumbersome solution:
The following is a solution that I came up with but is not elegant and does not work with my actual (big) data.
data_spreaded <- data %>% group_by(invoicetype, n) %>%
summarise(item1 = item[1], item2 = item[2], item3 = item[3],
item4 = item[4], item5 = item[5])
combinations <- tibble()
for (g in 1:nrow(data_spreaded)) {
for (i in 3:ncol(data_spreaded)) {
for (j in 3:ncol(data_spreaded)) {
if (i == j) { next }
combinations <-
bind_rows(combinations,
tibble(from = data_spreaded[g,i] %>% pull(),
to = data_spreaded[g,j] %>% pull(),
invoicetype = data_spreaded[g,1] %>% pull(),
n = data_spreaded[g,2]%>% pull()))
}
}
}
combinations <- combinations %>%
distinct() %>% # remove the double counted
filter(!is.na(from), !is.na(to)) %>% # remove empty combinations
group_by(from, to) %>%
summarise(n = sum(n)) %>%
ungroup()
#library(igraph)
g <- graph_from_data_frame(combinations, directed = F)
To plot using ggraph I use:
E(g)$weight <- combinations$n
#library(ggraph)
set.seed(123)
ggraph(g, layout = "with_kk") +
geom_node_point() +
geom_node_text(aes(label = name), repel = T) +
geom_edge_link(aes(color = weight, label = n))
Upvotes: 0
Views: 355
Reputation: 667
I usually tailor something like this to similar situations.
library(tidyverse)
data <- tibble(invoicetype = c(1,1,1,2,2,3,3,4,4,4,4,4,5,5,6,7,7,8,8,8,9,9),
item = c("bread", "butter", "eggs", "bread", "coke", "coke", "eggs",
"bread", "butter","coke", "pasta", "water", "coke", "water",
"coke", "bread", "butter", "eggs", "coke", "water", "pasta",
"bread"),
n = c(10,10,10,8,8,7,7,4,4,4,4,4,3,3,3,2,2,1,1,1,1,1))
data %>%
mutate(item2 = item) %>% # make a second item column
group_by(invoicetype) %>%
expand(item, item2, nesting(n)) %>% # get all in-group combinations
ungroup() %>%
filter(item != item2) %>% # drop loops
mutate(from = map2_chr(item, item2, min), # for undirected, sort dyad's names...
to = map2_chr(item, item2, max)) %>% # ... alphabetically
distinct(from, to, n) %>% # drop duplicate rows and unused columns
group_by(from, to) %>%
summarise(weight = sum(n)) %>%
ungroup()
#> # A tibble: 14 x 3
#> from to weight
#> <chr> <chr> <dbl>
#> 1 bread butter 16
#> 2 bread coke 12
#> 3 bread eggs 10
#> 4 bread pasta 5
#> 5 bread water 4
#> 6 butter coke 4
#> 7 butter eggs 10
#> 8 butter pasta 4
#> 9 butter water 4
#> 10 coke eggs 8
#> 11 coke pasta 4
#> 12 coke water 8
#> 13 eggs water 1
#> 14 pasta water 4
Upvotes: 0
Reputation: 865
A lot of time can be saved if you just left join the data to itself. A lot of edge lists follow this type of work flow:
combo <- data %>%
#join the data to itself
left_join(data, by = c('invoicetype', 'n')) %>%
#this is undirected so x %--% y is the same as y %--% x
filter(item.x < item.y) %>%
group_by(item.x, item.y) %>%
summarize(n = sum(n))
Here's the plot
g <- graph_from_data_frame(combo2, directed = F)
g_strength <- strength(g, weights = E(g)$n)
set.seed(1234)
plot(g,
edge.width = E(g)$n/max(E(g)$n) * 10,
vertex.size = g_strength/max(g_strength) * 20)
I hope this helps
Upvotes: 3