Reputation: 296
I've the following problem: I've a text, separated by chapters and stored by a vector. Suppose something like:
text <- c("Here are information about topic1.",
"Here are some information about topic2 or topic3.",
"Chapter number 4 is really annoying.",
"Topic4 is discussed in this chapter.")
And I want to extract the different topics mentioned in the different chapters. So my output should be something like:
output
[1] [2]
[1] "topic1"
[2] "topic2" "topic3"
[3]
[4] "topic3"
So I have some rows with multiple findings and some with no match.
I tried things with str_extract_all and unlist the list, but got problems causing the different number of row elements.
Thanks to all!
Upvotes: 0
Views: 37
Reputation: 70653
You can use rbind.fill.matrix
from plyr
.
text <- c("Here are information about topic1.",
"Here are some information about topic2 or topic3.",
"Chapter number 4 is really annoying.",
"Topic4 is discussed in this chapter.")
library(stringr)
library(plyr)
xy <- str_extract_all(text, pattern = "[Tt]opic\\d+")
xy <- sapply(xy, FUN = function(x) matrix(x, nrow = 1))
rbind.fill.matrix(xy) # from plyr
1 2
[1,] "topic1" NA
[2,] "topic2" "topic3"
[3,] NA NA
[4,] "Topic4" NA
Upvotes: 4