awadhesh204
awadhesh204

Reputation: 93

How to extract first 2 words from a string in R?

I need to extract first 2 words from a string. If the string contains more than 2 words, it should return the first 2 words else if the string contains less than 2 words it should return the string as it is.

I've tried using 'word' function from stringr package but it's not giving the desired output for cases where len(string) < 2.

word(dt$var_containing_strings, 1,2, sep=" ")

Example: Input String: Auto Loan (Personal)
Output: Auto Loan

Input String: Others Output: Others

Upvotes: 6

Views: 6507

Answers (3)

Ronak Shah
Ronak Shah

Reputation: 388982

You could use regex in base R using sub

sub("(\\w+\\s+\\w+).*", "\\1", "Auto Loan (Personal)")
#[1] "Auto Loan"

which will also work if you have only one word in the text

sub("(\\w+\\s+\\w+).*", "\\1", "Auto")
#[1] "Auto"

Explanation :

Here we extract the pattern shown inside round brackets which is (\\w+\\s+\\w+) which means :

\\w+ One word followed by \\s+ whitespace followed by \\w+ another word, so in total we extract two words. Extraction is done using backreference \\1 in sub.

Upvotes: 6

tmfmnk
tmfmnk

Reputation: 39858

If you want to use stringr::word(), you can do:

ifelse(is.na(word(x, 1, 2)), x, word(x, 1, 2))

[1] "Auto Loan" "Others" 

Sample data:

x <- c("Auto Loan (Personal)", "Others")

Upvotes: 12

denisafonin
denisafonin

Reputation: 1136

Something like this?

a <- "this is a character string"

unlist(strsplit(a, " "))[1:2]

[1] "this" "is" 

EDIT: To add the part where original string is returned if number of worlds is less than 2, a simple if-else function can be used:

a <- "this is a character string"

words <- unlist(strsplit(a, " "))

if (length(words) > 2) {
  words[1:2]
} else {
  a
}

Upvotes: 5

Related Questions