Reputation: 33
I am using the R tm package and I am trying to select certain documents by their index and their metadata:
orbit_corpus<-Corpus( tm_corpus, readerControl = list(reader=myReader))
meta(my_corpus[[1]])
author : a8
origin : Department
heading : WhiB
id : 1
year : 2013
I would like to get find all documents that within the first hundred documents of my corpus that have been published in 2013. This works to identify whether the metadata 'year' for document 1 are 2013.
meta(my_corpus[[1]],"year") == 2013
[1] TRUE
I need something that gives me the option to find among the first 100 all indexes, which meet the criterion. I would imagine something similar to this (but it does not work and unfortunately would probably also not generate a list of the documents).
meta(orbit_corpus[[1:100]],"year") == 2013
Error in x$content[[i]] : recursive indexing failed at level 4
Many thanks for the help!
Upvotes: 3
Views: 2674
Reputation: 21621
You could use tm_filter
on the first 100 documents of your corpus (orbit_corpus[1:100]
)
tm_filter(orbit_corpus[1:100], FUN = function(x) meta(x)[["year"]] == "2013")
From the documentation
tm_filter
returns a corpus containing documents whereFUN
matches
Upvotes: 4