Reputation: 13
I have this code to calculate duplicate in a data frame using cosine similarity through firstly: first loop (nrow) times to take in each time one tweet then compares the cosine similarity results to this tweet with other tweets using second loop.
Here is my code:
for (i in 1:nrow(temp)) {
dup=0
one_Tweets = tweets$Tweet[i]
cos_similarity = data.frame("v1"=NULL) # NULL So that don't write previous value
cos_similarity=data.frame(sim <- round( sim.strings(AllTweets,one_Tweets), digits = 3) )
names(cos_similarity) = c( "v1")
for (b in i+1:nrow(temp)) {
Tweet_cos=cos_similarity$v1[b]
if ( Tweet_cos >= 0.900) {
count = count+1
tweets$flag[b]= 1
}else { #if ( Tweet_cos <0.900) {
tweets$flag[b]= 2
}
Tweet_cos=0
}
dup=tweets$duplicate[i]= tweets$duplicate[i]+count
count = 0
}
I have a problem in first loop, entered one time although that number of tweets in data frame 10000 tweets.
and i get the error:
Error in if (Tweet_cos >= 0.9) { : missing value where TRUE/FALSE needed
Upvotes: 1
Views: 101
Reputation: 3554
I dont still have rep to put it in comment but I think you are getting this problem because of NA/NULL in Tweet_cos vector. to debug remove this part from code:
for (b in i+1:nrow(temp)) {
Tweet_cos=cos_similarity$v1[b]
if ( Tweet_cos >= 0.900) {
count = count+1
tweets$flag[b]= 1
}else { #if ( Tweet_cos <0.900) {
tweets$flag[b]= 2
}
Tweet_cos=0
}
dup=tweets$duplicate[i]= tweets$duplicate[i]+count
count = 0
replace whole of this with print(cos_similarity$v1)
. You should ideally see some NA/NULL which by def could not be compared with 0.9 and hence the error.
If there are too many iterations/loop then try to print values of i
and b
where you are getting error and print cos_similarity$v1
only for that.
Please consider sharing small sample data so that others can replicate your problem
Upvotes: 0