Reputation: 103
I want to count how many common values do rows of a column have with each other.
This is what my dataframe looks like:
Location | Manager |
---|---|
L1 | M45 |
L2 | M45 |
L34 | M12 |
L5 | M45 |
L23 | M12 |
L4 | M3 |
L11 | M45 |
I want to create a new dataframe with two columns: Location and Links. The new Links column should contain all the locations over the common manager. So, since L1, L2 and L5 have a common manager they should be linked together and so on.
Location | Manager |
---|---|
L1 | L2,L5 |
L2 | L1,L5 |
L5 | L23 |
L5 | L1,L2 |
L23 | L34 |
L4 | |
L11 |
After this, can we create a network graph?
Thanks!
Upvotes: 0
Views: 210
Reputation: 173793
For the first part (getting all locations covered by a manager in a single row) we can do:
library(dplyr)
df %>%
group_by(Manager) %>%
summarize(Location = paste(Location, collapse = ", "))
#> # A tibble: 3 x 2
#> Manager Location
#> <chr> <chr>
#> 1 M12 L34, L23
#> 2 M3 L4
#> 3 M45 L1, L2, L5, L11
Your original data frame is already in the correct format to make a graph:
plot(tidygraph::as_tbl_graph(df))
If you want a prettier representation of the graph, you could use ggraph
, for example:
library(ggraph)
df[2:1] %>%
rbind(data.frame(Manager = "Managers", Location = unique(df$Manager))) %>%
tidygraph::as_tbl_graph() %>%
ggraph(circular = TRUE) +
geom_edge_bend() +
geom_node_circle(aes(r = ifelse(name == "Managers", 0, 0.1),
fill = substr(name, 1, 1))) +
geom_node_text(aes(label = ifelse(name == "Managers", "", name))) +
scale_fill_manual(values = c("deepskyblue", "gold"),
labels = c("Managers", "Locations"),
name = NULL) +
theme_void(base_size = 16) +
coord_equal()
Question data in reproducible format
df <- data.frame(Location = c("L1", "L2", "L34", "L5", "L23", "L4", "L11"),
Manager = c("M45", "M45", "M12", "M45", "M12", "M3", "M45"))
Created on 2022-08-31 with reprex v2.0.2
Upvotes: 1