Reputation: 13
I have a list of names (e.g: authors) and a pdf file which includes those names. I need to calculate how many times those authors are mentioned in the pdf file.
Let's say my table of authors is named "author" and the pdf file's name is "pdf" (I converted and stored this pdf file in R already using pdf_text
already)
I've tried the following:
author$count <- 0
author$count <- for (i in author$name) { sum(str_count(pdf, i))}
But it didn't work. When I printed author$count, the results were NULL
. Is there a way to fix this?
Upvotes: 0
Views: 48
Reputation: 545598
Unlike most other functions, for
does not return a value in R, which unfortunately makes it much less useful. Instead, in most situations one of the vector mapping functions (lapply
, vapply
etc.) is more suitable to the task.
In your case, vapply
does the trick:
author$count <- vapply(author$name, \(i) sum(str_count(pdf, i)), integer(1L))
(If you’re using an older version of R, you need to replace \(i)
with function (i)
.)
Note that you do not need to assign 0
to author$count
beforehand. That value would be overwritten anyway.
vapply
vs. sapply
vapply
ensures that the result of the function call actually conforms to the expected format (here: integer(1L)
, i.e. every element is a single integer). sapply
doesn’t do this, which makes using sapply
risky in non-interactive code, since it won’t notify you if there’s an error with the data. purrr::map_*
behaves similarly to vapply
.
Upvotes: 1
Reputation: 887158
We may need to assign within the loop. Also, loop across the sequence to do the assignment
for(i in seq_along(author$name)) {
author$count[i] <- sum(str_count(pdf, author$name[i]))
}
Upvotes: 0