W W
W W

Reputation: 789

How to get all words which starts with capital letter?

Given:

text <- "fsfs blabla Honda t Asus"

I want to get result:

[1] "Honda" "Asus"

I have done it by this function:

foo <- function(txt){
  txtNew <- txt
  txtNew2 <- txt
  txtMemory <- ""
  while(txtNew != txtMemory){
    txtNew <- txtNew2
    txtMemory <- txtNew2
    txtNew <- gsub("\\s[a-z]","",txtNew)
    txtNew2 <- paste0(" ", txtNew)
  }
  txtNew <- sub("^\\s+", "", txtNew)
  strsplit(txtNew, " ")
}
foo("fsfs blabla Honda t Asus")

but I guess there is much easier way in R?

Upvotes: 0

Views: 3051

Answers (4)

Flavio Sousa
Flavio Sousa

Reputation: 455

I would do so:

const str = "fsfs blabla Honda t Asus";
const regex = /([A-Z]\w+)/g;
const result = [];
let m;
while ((m = regex.exec(str)) !== null) result.push(m[1]);
$('#result').html(JSON.stringify(result));
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<p id="result"></p>

Upvotes: 0

lukeA
lukeA

Reputation: 54237

Here's a solution without regular expressions:

text <- "fsfs blabla Honda t Asus"
x <- strsplit(text, " ", T)[[1]]
x[substr(x, 1, 1) %in% LETTERS]
# [1] "Honda" "Asus" 

Upvotes: 3

lmo
lmo

Reputation: 38510

In base R, you could do

grep("^[A-Z]", scan(textConnection("fsfs blabla Honda t Asus"), ""), value=TRUE)
Read 5 items
[1] "Honda" "Asus" 

Here, scan reads in the text and splits it by white space. Then grep with values=TRUE returns all elements in the character vector which match the subexpression "^[A-Z]" which can be read "starts with a capital letter."

In place of scan, you could use strsplit / unlist for the same result.

grep("^[A-Z]", unlist(strsplit("fsfs blabla Honda t Asus", " ")), value=TRUE)

Upvotes: 4

akrun
akrun

Reputation: 887213

We can use str_extract to match a capital letter ([A-Z]) followed by a word boundary (\\b) followed by one or more word characters

library(stringr)
str_extract_all(text, "\\b[A-Z]\\w+")[[1]]
#[1] "Honda" "Asus" 

Or with gregexpr/regmatches from base R

regmatches(text, gregexpr("\\b[A-Z]\\w+", text))
#[1] "Honda" "Asus" 

Upvotes: 4

Related Questions