Reputation: 49
allWords
is a vector of 1.3 million words, with some repetition. What I want to do, is to create two vectors:
A with the word
B with the occurance of the word
So that I can later join them in a Matrix and thus associate them, like: "mom", 3 ; "pencil", 14 etc.
for(word in allWords){
#get a vector with indexes for all repetitions of a word
temp <- which(allWords==word)
#Make "allWords" smaller - remove duplicates
allWords= allWords[-which(allWords==word)]
#Calculate occurance
occ<-length(temp)
#store
A = c(A,word)
B = c(B,occ)
}
This for loop takes forever and I don't really know why or what I am doing wrong. Reading the 1.3 million words from a file goes as fast as 5 seconds, but performing these basic operations never lets the algorithm terminate.
Upvotes: 1
Views: 222
Reputation: 14498
You could use list
to make something like hash "key:value" pairs.
data = c("joe", "tom", "sue", "joe", "jen")
aList = list()
for(i in data){
if (length(aList[[i]]) == 0){
aList[[i]] = 1
} else {
aList[[i]] = aList[[i]] + 1
}
}
Result
$joe
[1] 2
$tom
[1] 1
$sue
[1] 1
$jen
[1] 1
Upvotes: 0
Reputation: 61164
Give the size of your vector, I think data.table
can be a good friend in this situation_
> library(data.table)
> x <- c("dog", "cat", "dog") # Ferdinand.kraft's example vector
> dtx <- data.table(x) # converting `x` vector into a data.table object
> dtx[, .N, by="x"] # Computing the freq for each word
x N
1: dog 2
2: cat 1
Upvotes: 2
Reputation: 12819
Use table()
:
> table(c("dog", "cat", "dog"))
cat dog
1 2
The vectors are columns of the corresponding dataframe:
A <- as.data.frame(table(c("dog", "cat", "dog")))[,1]
B <- as.data.frame(table(c("dog", "cat", "dog")))[,2]
Result:
> A
[1] cat dog
Levels: cat dog
> B
[1] 1 2
Upvotes: 3