jerry_sjtu
jerry_sjtu

Reputation: 5466

How can I build an inverted index from a data frame in R?

Say I have a data frame in R : data.frame(x=1:4, y=c("a b c", "b", "a c", "c"))

  x     y
1 1 a b c
2 2     b
3 3   a c
4 4     c

Now I want to build a new data frame, an inverted index which is quite common in IR or recommendation systems, from it:

y    x
a    1 3
b    1 2
c    1 3 4

How can I do this in an efficient way?

Upvotes: 2

Views: 1622

Answers (3)

Matthew Lundberg
Matthew Lundberg

Reputation: 42669

conv <- function(x) {
  l <- function(z) {
    paste(x$x[grep(z, x$y)], collapse=' ')
  }
  lv <- Vectorize(l)

  alphabet <- unique(unlist(strsplit(as.character(x$y), ' '))) # hard-coding this might be preferred for some uses.
  y <- lv(alphabet)
  data.frame(y=names(y), x=y)
}

x <- data.frame(x=1:4, y=c("a b c", "b", "a c", "c"))
> conv(x)
##   y     x
## a a   1 3
## b b   1 2
## c c 1 3 4

Upvotes: 1

Ricardo Saporta
Ricardo Saporta

Reputation: 55390

quick and dirty

  original.df <- data.frame(x=1:4, y=c("a b c", "b", "a c", "c"))

  original.df$y <- as.character(original.df$y)

  y.split <- strsplit(original.df$y, " ")

  y.unlisted <- unique(unlist(y.split))

  new.df <- 
    sapply(y.unlisted, function(element) 
      paste(which(sapply(y.split, function(y.row) element %in% y.row)), collapse=" " ))

  as.data.frame(new.df)

  >  new.df
  a    1 3
  b    1 2
  c  1 3 4

Upvotes: 0

thelatemail
thelatemail

Reputation: 93908

An attempt, after converting y to characters:

test <- data.frame(x=1:4,y=c("a b c","b","a c","c"),stringsAsFactors=FALSE)

result <- strsplit(test$y," ")
result2 <- sapply(unique(unlist(result)),function(y) sapply(result,function(x) y %in% x))
result3 <- apply(result2,2,function(x) test$x[which(x)])
final <- data.frame(x=names(result3),y=sapply(result3,paste,collapse=" "))

> final
  x     y
a a   1 3
b b   1 2
c c 1 3 4

Upvotes: 0

Related Questions