Reputation: 11
So, I have a document corpus and i need to find all the words which are all capital(i.e., every character in that word is capital) in all the documents in R. I am not sure how to find that. I have looked at the text mining 'tm' package in R and there is no such functions which can find that.
Input String: "Russia Is THE BiggEST cOUNTRY"
Output required: "THE"
How to do this using "tm" package?
Upvotes: 1
Views: 1612
Reputation: 2359
You can use gregexpr and regmatches:
unlist(regmatches(abc, gregexpr('\\b[A-Z]+\\b', abc)))
[1] "THE"
abc <- "Russia Is THE BiggEST cOUNTRY"
Upvotes: 2
Reputation: 23099
With stringr (if you want to find all such words (as a vector) with caps not just the first one):
s = "Russia Is THE BiggEST cOUNTRY IN the WORLD"
library(stringr)
unlist(str_match_all(s, "\\b[A-Z]+\\b"))
[1] "THE" "IN" "WORLD"
Upvotes: 2
Reputation: 4554
Try to use regular expression.
sub('.*(\\b[A-Z]+\\b).*','\\1',string)
#[1] "THE"
Upvotes: 1