Reputation: 49

R takes forever to compute a simple procedure

allWords is a vector of 1.3 million words, with some repetition. What I want to do, is to create two vectors:

A with the word

B with the occurance of the word

So that I can later join them in a Matrix and thus associate them, like: "mom", 3 ; "pencil", 14 etc.

for(word in allWords){

    #get a vector with indexes for all repetitions of a word
    temp <- which(allWords==word) 
    #Make "allWords" smaller - remove duplicates
    allWords= allWords[-which(allWords==word)]
    #Calculate occurance
    occ<-length(temp)
    #store
    A = c(A,word)
    B = c(B,occ)
}

This for loop takes forever and I don't really know why or what I am doing wrong. Reading the 1.3 million words from a file goes as fast as 5 seconds, but performing these basic operations never lets the algorithm terminate.

Upvotes: 1

Answers (3)

AGS

Reputation: 14498

You could use list to make something like hash "key:value" pairs.

data = c("joe", "tom", "sue", "joe", "jen")

aList = list()

for(i in data){
    if (length(aList[[i]]) == 0){
      aList[[i]] = 1
    } else {
      aList[[i]] = aList[[i]] + 1
    }
}

Result

$joe
[1] 2

$tom
[1] 1

$sue
[1] 1

$jen
[1] 1

Upvotes: 0

Jilber Urbina

Reputation: 61164

Give the size of your vector, I think data.table can be a good friend in this situation_

> library(data.table)
> x <- c("dog", "cat", "dog")  # Ferdinand.kraft's example vector
> dtx <- data.table(x)         # converting `x` vector into a data.table object
> dtx[, .N, by="x"]            # Computing the freq for each word
     x N
1: dog 2
2: cat 1

Upvotes: 2

Ferdinand.kraft

Reputation: 12819

Use table():

> table(c("dog", "cat", "dog"))

cat dog 
  1   2

The vectors are columns of the corresponding dataframe:

A <- as.data.frame(table(c("dog", "cat", "dog")))[,1]
B <- as.data.frame(table(c("dog", "cat", "dog")))[,2]

Result:

> A
[1] cat dog
Levels: cat dog
> B
[1] 1 2

Upvotes: 3

R takes forever to compute a simple procedure

Answers (3)

Related Questions