salvu
salvu

Reputation: 519

Use apply family between each element in a list and another set in R

I am splitting a text document into n-chunks and storing each chunk in a list. Each chunk is converted into a set of words and then a cosine similarity function is applied between one of the chunks and another shorter text that is also converted into a set before being sent to the function. I need to somehow pass each chunk to the function to be compared with the second set but was wondering if one of the apply family's function can do the job rather than using a loop. It would also save some time to store each result in a vector.

This is what I am using (part of the code is from this:

library("data.table","qdap","sets", "lsa")

s <- c("employees businesses san gwann admitted sales taken hit after traffic diversions implemented without notice vjal ir - rihan over weekend.", 
"also complained werent consulted diversion blocked vehicles driving centre      san gwann via roundabout forks san gwann industrial estate, church forced   motorists take detour around block instead.", 
"barriers erected roundabout exit, after youtube video cars disregarding signage passing through roundabout regardless went viral.", 
"planned temporary diversion, brace san gwann influx cars set pass through during works kappara junction project.", 
"usually really busy weekend, our sales lower round, corner store worker maria abela admitted maltatoday.")

c <- "tm dont break whats broken. only queues developing, pass here every morning never experienced such mess notwithstanding tm officials directing traffic. hope report congestion happening area. lc tm tried pro - active hope admit recent traffic changes working."


calculateCosine <- function(setX, setY){
require(qdap)
y <- c(unlist(as.character(tolower(setY))))
x <- c(unlist(strsplit(as.character(tolower(setX)), split = ", ")))
diffLength <- length(y) - length(x)
x <- bag_o_words(x)
for(pad in 1 : diffLength){
  x <- c(x, "")
  }
  # write both sets to temp files and calculate cosine similarity
  write(y, file=paste(td, "Dy", sep="/"))
  write(x, file=paste(td, "Dx", sep="/"))
  myMatrix = textmatrix(td, stopwords=stopwords_en, minWordLength = 3)
  similCosine <- as.numeric(round(cosine(myMatrix[,1], myMatrix[,2]), 3))
  return(similCosine)
}

n <- 3
max <- length(s)%/%n
x <- seq_along(s)
d1 <- split(s, ceiling(x/max))
res <- c()
for(i in 1 : length(d1)){
  val <- calculateCosine(as.set(paste(d1[i], sep = " ", collapse = " ")), as.set(c))
  res <- c(res, val)
}

For neatness sake, would it be possible to change the loop into one of the apply functions? Any ideas or comments would be greatly appreciated. Thanks.

Upvotes: 1

Views: 61

Answers (1)

Parfait
Parfait

Reputation: 107767

Consider adjusting the two for loops with a rep and sapply:

Inside calculateCosine

# ORIGINAL CODE
x <- bag_o_words(x)
for(pad in 1 : diffLength){
  x <- c(x, "")
  }

# ADJUSTED CODE
x <- bag_o_words(x)
x <- c(x, rep("", diffLength))     

# OR ONE LINE
x <- c(bag_o_words(x), rep("", diffLength))

Outside calculateCosine (change to lapply if you require a returned list instead of vector/matrix)

# ORIGINAL CODE
res <- c()
for(i in 1 : length(d1)){
  val <- calculateCosine(as.set(paste(d1[i], sep = " ", collapse = " ")), as.set(c))
  res <- c(res, val)
}

# ADJUSTED CODE
res <- sapply(d1, function(i) {
  calculateCosine(as.set(paste(i, sep = " ", collapse = " ")), as.set(c))
})

Upvotes: 4

Related Questions