Reputation: 539
I'm attempting to use udpipe's RAKE to generate a list of 25 RAKE tokens per document in a dataframe and write those tokens (plus a simple str_count) back to the dataframe. I constructed a for loop to handle, but instead I'm writing the same result to every line, instead of different results to each line.
Packages installed and used are udpipe, dplyr, stringi, stringr, data.table.
annotation$length <- nchar(annotation$token)
annotation <- annotation %>% filter(length >= 3 )
counter <- textdf$doc_id
for (i in counter) {
subannotation <- annotation %>% filter(doc_id == i)
stats <-
keywords_rake(
x = subannotation,
term = "token", #token or lemma
group = "doc_id",
ngram_max = 3,
n_min = 1,
relevant = subannotation$upos %in% c("NOUN", "VERB", "ADV", "ADJ")
)
stats <- stats %>% top_n(25,rake)
checktopics <- paste(stats$keyword, collapse = " ")
textdf$topics <- checktopics
textdf$score <- str_count(checktopics,"cheese")
}
The intended outcome should be something like:
id score topics
1 12 chocolate chocoholics cheese
2 1 plastic waste cheese
3 3 neuroscientists data system
The current outcome is:
id score topics
1 3 neuroscientists data system
2 3 neuroscientists data system
3 3 neuroscientists data system
What am I doing wrong?
Thank you!
Upvotes: 1
Views: 508
Reputation: 539
The appropriate fix is to add the pointer to the line in the loop. Derp.
textdf$topics[i] <- checktopics
textdf$score[i] <- str_count(checktopics,"cheese")
Upvotes: 1