Homer Jay Simpson
Homer Jay Simpson

Reputation: 1218

How can I match all single elements or pairs in a data frame with a group in another data frame in R?

I have a data frame that looks like this :

Groups elements p
animals cat,dog,bird 1
furniture chair,table 2
vehicles car,motorcycle 3
House animals,furniture 4
Commute bike,rollers 5
Food pasta,pizza 6
Need water,power 7
Family House,Mother 8

the column groups contain all the groups and the hyper groups.For example animals is a group. But House is a hyper group that within contain the groups animals and furniture. The other hyper group is the Family which contains the hyper group House plus the Mother. So this is my universe with all the possible groups and hyper groups according the elements that I have. The third column "p" contain the value that corresponds to each group or hyper group that I need to implement a function later.

Now I take the first 3 days of a month :

date var1
2022-01-01 cat
2022-01-01 cat
2022-01-01 cat
2022-01-02 cat,Mother,bike,pasta
2022-01-02 cat,Mother,bike,pasta
2022-01-02 cat,Mother,bike,pasta
2022-01-03 dog,bird
2022-01-03 dog,bird
2022-01-03 dog,bird

Ideally I want the reported data frame to look like this :

date var1 Groups
2022-01-01 cat NA
2022-01-01 cat NA
2022-01-01 cat NA
2022-01-02 cat,Mother,bike,pasta Family
2022-01-02 cat,Mother,bike,pasta Family
2022-01-02 cat,Mother,bike,pasta Family
2022-01-03 dog,bird animals
2022-01-03 dog,bird animals
2022-01-03 dog,bird animals

How can I implement this in R ?

The groups data frame:



Groups = c("animals","furniture","vehicles","House",
         "Commute","Food","Need","Family")
elements = c(c("cat,dog,bird"),c("chair,table"),c("car,motorcycle"),c("animals,furniture"),
         c("bike,rollers"),c("pasta,pizza"),c("water,power"),
         c("House,Mother"))
p = seq(1,8,1)
data = tibble(Groups,elements,p);data

and the sample data frame

date = c(rep(as.Date("2022/1/1"),3),
         rep(as.Date("2022/1/2"),3),
         rep(as.Date("2022/1/3"),3))
var1 = c(rep("cat",3),rep("cat,Mother,bike,pasta",3),rep("dog,bird",3))

df = tibble(date,var1);df

Any idea of how can I combine the two data frames ;

Upvotes: 2

Views: 141

Answers (1)

ThomasIsCoding
ThomasIsCoding

Reputation: 102181

I guess igraph would be a nice helper for your question, since a visualization of memberships in data can be presented in a graph

g <- data %>%
  separate_rows(elements, sep = ",") %>%
  graph_from_data_frame()

plot(g)

enter image description here


We can start from defining a custom function f like below

library(dplyr)
library(tidyr)
library(igraph)

f <- function(g, s) {
  mem <- membership(clusters(g))
  sapply(
    s,
    function(x) {
      clt <- mem[strsplit(x, ",")[[1]]]
      if (length(clt) > 1) {
        unlist(tapply(names(clt), clt, function(x) {
          if (length(x) > 1) {
            degree(g, mode = "in") == 0
            rt <- names(which(mem[names(which(degree(g, mode = "in") == 0))] == mem[x][1]))
            names(neighbors(g, x[which.min(distances(g, x, rt))], "in"))
          }
        }))
      } else {
        NA
      }
    }
  )
}

then run

df %>%
  mutate(Groups = f(
    data %>%
      separate_rows(elements, sep = ",") %>%
      graph_from_data_frame(),
    unique(var1)
  )[match(var1, unique(var1))])

and you will see

        date                  var1  Groups
1 2022-01-01                   cat    <NA>
2 2022-01-01                   cat    <NA>
3 2022-01-01                   cat    <NA>
4 2022-01-02 cat,Mother,bike,pasta  Family
5 2022-01-02 cat,Mother,bike,pasta  Family
6 2022-01-02 cat,Mother,bike,pasta  Family
7 2022-01-03              dog,bird animals
8 2022-01-03              dog,bird animals
9 2022-01-03              dog,bird animals

Upvotes: 2

Related Questions