R: extracting pattern, different times

Question

I've the following problem: I've a text, separated by chapters and stored by a vector. Suppose something like:

text <- c("Here are information about topic1.", 
"Here are some information about topic2 or topic3.", 
"Chapter number 4 is really annoying.", 
"Topic4 is discussed in this chapter.")

And I want to extract the different topics mentioned in the different chapters. So my output should be something like:

output
      [1]       [2]
[1] "topic1"
[2] "topic2" "topic3"
[3]
[4] "topic3"

So I have some rows with multiple findings and some with no match.

I tried things with str_extract_all and unlist the list, but got problems causing the different number of row elements.

Thanks to all!

Roman Luštrik · Accepted Answer

You can use rbind.fill.matrix from plyr.

text <- c("Here are information about topic1.", 
          "Here are some information about topic2 or topic3.", 
          "Chapter number 4 is really annoying.", 
          "Topic4 is discussed in this chapter.")

library(stringr)
library(plyr)

xy <- str_extract_all(text, pattern = "[Tt]opic\d+")
xy <- sapply(xy, FUN = function(x) matrix(x, nrow = 1))
rbind.fill.matrix(xy) # from plyr

     1        2       
[1,] "topic1" NA      
[2,] "topic2" "topic3"
[3,] NA       NA      
[4,] "Topic4" NA

R: extracting pattern, different times

Answers (1)

Related Questions