Reputation: 93
I need to extract first 2 words from a string. If the string contains more than 2 words, it should return the first 2 words else if the string contains less than 2 words it should return the string as it is.
I've tried using 'word' function from stringr package but it's not giving the desired output for cases where len(string) < 2.
word(dt$var_containing_strings, 1,2, sep=" ")
Example:
Input String: Auto Loan (Personal)
Output: Auto Loan
Input String: Others Output: Others
Upvotes: 6
Views: 6507
Reputation: 388982
You could use regex in base R using sub
sub("(\\w+\\s+\\w+).*", "\\1", "Auto Loan (Personal)")
#[1] "Auto Loan"
which will also work if you have only one word in the text
sub("(\\w+\\s+\\w+).*", "\\1", "Auto")
#[1] "Auto"
Explanation :
Here we extract the pattern shown inside round brackets which is (\\w+\\s+\\w+)
which means :
\\w+
One word followed by \\s+
whitespace followed by \\w+
another word, so in total we extract two words. Extraction is done using backreference \\1
in sub
.
Upvotes: 6
Reputation: 39858
If you want to use stringr::word()
, you can do:
ifelse(is.na(word(x, 1, 2)), x, word(x, 1, 2))
[1] "Auto Loan" "Others"
Sample data:
x <- c("Auto Loan (Personal)", "Others")
Upvotes: 12
Reputation: 1136
Something like this?
a <- "this is a character string"
unlist(strsplit(a, " "))[1:2]
[1] "this" "is"
EDIT: To add the part where original string is returned if number of worlds is less than 2, a simple if-else function can be used:
a <- "this is a character string"
words <- unlist(strsplit(a, " "))
if (length(words) > 2) {
words[1:2]
} else {
a
}
Upvotes: 5