Reputation: 1218
I have a data frame that looks like this :
Groups | elements | p |
---|---|---|
animals | cat,dog,bird | 1 |
furniture | chair,table | 2 |
vehicles | car,motorcycle | 3 |
House | animals,furniture | 4 |
Commute | bike,rollers | 5 |
Food | pasta,pizza | 6 |
Need | water,power | 7 |
Family | House,Mother | 8 |
the column groups contain all the groups and the hyper groups.For example animals is a group. But House is a hyper group that within contain the groups animals and furniture. The other hyper group is the Family which contains the hyper group House plus the Mother. So this is my universe with all the possible groups and hyper groups according the elements that I have. The third column "p" contain the value that corresponds to each group or hyper group that I need to implement a function later.
Now I take the first 3 days of a month :
date | var1 |
---|---|
2022-01-01 | cat |
2022-01-01 | cat |
2022-01-01 | cat |
2022-01-02 | cat,Mother,bike,pasta |
2022-01-02 | cat,Mother,bike,pasta |
2022-01-02 | cat,Mother,bike,pasta |
2022-01-03 | dog,bird |
2022-01-03 | dog,bird |
2022-01-03 | dog,bird |
Ideally I want the reported data frame to look like this :
date | var1 | Groups |
---|---|---|
2022-01-01 | cat | NA |
2022-01-01 | cat | NA |
2022-01-01 | cat | NA |
2022-01-02 | cat,Mother,bike,pasta | Family |
2022-01-02 | cat,Mother,bike,pasta | Family |
2022-01-02 | cat,Mother,bike,pasta | Family |
2022-01-03 | dog,bird | animals |
2022-01-03 | dog,bird | animals |
2022-01-03 | dog,bird | animals |
How can I implement this in R ?
The groups data frame:
Groups = c("animals","furniture","vehicles","House",
"Commute","Food","Need","Family")
elements = c(c("cat,dog,bird"),c("chair,table"),c("car,motorcycle"),c("animals,furniture"),
c("bike,rollers"),c("pasta,pizza"),c("water,power"),
c("House,Mother"))
p = seq(1,8,1)
data = tibble(Groups,elements,p);data
and the sample data frame
date = c(rep(as.Date("2022/1/1"),3),
rep(as.Date("2022/1/2"),3),
rep(as.Date("2022/1/3"),3))
var1 = c(rep("cat",3),rep("cat,Mother,bike,pasta",3),rep("dog,bird",3))
df = tibble(date,var1);df
Any idea of how can I combine the two data frames ;
Upvotes: 2
Views: 141
Reputation: 102181
I guess igraph
would be a nice helper for your question, since a visualization of memberships in data
can be presented in a graph
g <- data %>%
separate_rows(elements, sep = ",") %>%
graph_from_data_frame()
plot(g)
We can start from defining a custom function f
like below
library(dplyr)
library(tidyr)
library(igraph)
f <- function(g, s) {
mem <- membership(clusters(g))
sapply(
s,
function(x) {
clt <- mem[strsplit(x, ",")[[1]]]
if (length(clt) > 1) {
unlist(tapply(names(clt), clt, function(x) {
if (length(x) > 1) {
degree(g, mode = "in") == 0
rt <- names(which(mem[names(which(degree(g, mode = "in") == 0))] == mem[x][1]))
names(neighbors(g, x[which.min(distances(g, x, rt))], "in"))
}
}))
} else {
NA
}
}
)
}
then run
df %>%
mutate(Groups = f(
data %>%
separate_rows(elements, sep = ",") %>%
graph_from_data_frame(),
unique(var1)
)[match(var1, unique(var1))])
and you will see
date var1 Groups
1 2022-01-01 cat <NA>
2 2022-01-01 cat <NA>
3 2022-01-01 cat <NA>
4 2022-01-02 cat,Mother,bike,pasta Family
5 2022-01-02 cat,Mother,bike,pasta Family
6 2022-01-02 cat,Mother,bike,pasta Family
7 2022-01-03 dog,bird animals
8 2022-01-03 dog,bird animals
9 2022-01-03 dog,bird animals
Upvotes: 2