colonus
colonus

Reputation: 11

Use substr until condition is met

I have a vector from which I just need the first word. The words have different lengths. Words are separated by a symbol (. and _) How can I use the substr() function to get a new vector with just the first word?

I was thinking of something like this

x <- c("wooombel.ab","mugran.cd","friendly_ef.ab","hungry_kd.xy")
y <- substr(x,0, ???)

Upvotes: 1

Views: 91

Answers (4)

Tyler Rinker
Tyler Rinker

Reputation: 109844

An extraction approach with stringi:

library(stringi)
stri_extract_first_regex(x, "[a-z]+(?=[._])")

## [1] "wooombel" "mugran"   "friendly" "hungry"  

Though "[^a-z]+(?=[._])" may be more explicit.

Regex explanation:

[^a-z]+                  any character except: 'a' to 'z' (1 or
                         more times)
(?=                      look ahead to see if there is:
  [._]                     any character of: '.', '_'
)                        end of look-ahead

Upvotes: 1

Rich Scriven
Rich Scriven

Reputation: 99331

You could also use package stringr. It has some really handy functions for string manipulation.

One that comes to mind for this problem is word. It has a sep argument that allows the use of a regular expression.

> x <- c("wooombel.ab","mugran.cd","friendly_ef.ab","hungry_kd.xy")
> library(stringr)
> word(x, sep = "[._]")
# [1] "wooombel" "mugran"   "friendly" "hungry"  

Another option that allows you to continue to use substr is str_locate. So if we just subtract 1 from its result, we can get the desired first words.

> substr(x, 1, str_locate(x, "[._]")-1)
# [1] "wooombel" "mugran"   "friendly" "hungry"   

Upvotes: 2

rnso
rnso

Reputation: 24535

Try:

sapply(strsplit(x,'[._]'), function(x) x[1])
[1] "wooombel" "mugran"   "friendly" "hungry"  

Upvotes: 2

sgibb
sgibb

Reputation: 25726

I think sub with some regular expressions would be the easiest solution:

sub(pattern = "[._].*", replacement = "", x = x)
# [1] "wooombel" "mugran"   "friendly" "hungry"

Upvotes: 4

Related Questions