Reputation: 789
Given:
text <- "fsfs blabla Honda t Asus"
I want to get result:
[1] "Honda" "Asus"
I have done it by this function:
foo <- function(txt){
txtNew <- txt
txtNew2 <- txt
txtMemory <- ""
while(txtNew != txtMemory){
txtNew <- txtNew2
txtMemory <- txtNew2
txtNew <- gsub("\\s[a-z]","",txtNew)
txtNew2 <- paste0(" ", txtNew)
}
txtNew <- sub("^\\s+", "", txtNew)
strsplit(txtNew, " ")
}
foo("fsfs blabla Honda t Asus")
but I guess there is much easier way in R?
Upvotes: 0
Views: 3051
Reputation: 455
I would do so:
const str = "fsfs blabla Honda t Asus";
const regex = /([A-Z]\w+)/g;
const result = [];
let m;
while ((m = regex.exec(str)) !== null) result.push(m[1]);
$('#result').html(JSON.stringify(result));
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<p id="result"></p>
Upvotes: 0
Reputation: 54237
Here's a solution without regular expressions:
text <- "fsfs blabla Honda t Asus"
x <- strsplit(text, " ", T)[[1]]
x[substr(x, 1, 1) %in% LETTERS]
# [1] "Honda" "Asus"
Upvotes: 3
Reputation: 38510
In base R, you could do
grep("^[A-Z]", scan(textConnection("fsfs blabla Honda t Asus"), ""), value=TRUE)
Read 5 items
[1] "Honda" "Asus"
Here, scan
reads in the text and splits it by white space. Then grep
with values=TRUE returns all elements in the character vector which match the subexpression "^[A-Z]" which can be read "starts with a capital letter."
In place of scan
, you could use strsplit
/ unlist for the same result.
grep("^[A-Z]", unlist(strsplit("fsfs blabla Honda t Asus", " ")), value=TRUE)
Upvotes: 4
Reputation: 887213
We can use str_extract
to match a capital letter ([A-Z]
) followed by a word boundary (\\b
) followed by one or more word characters
library(stringr)
str_extract_all(text, "\\b[A-Z]\\w+")[[1]]
#[1] "Honda" "Asus"
Or with gregexpr/regmatches
from base R
regmatches(text, gregexpr("\\b[A-Z]\\w+", text))
#[1] "Honda" "Asus"
Upvotes: 4