Reputation: 11
I have a vector from which I just need the first word. The words have different lengths. Words are separated by a symbol (. and _) How can I use the substr()
function to get a new vector with just the first word?
I was thinking of something like this
x <- c("wooombel.ab","mugran.cd","friendly_ef.ab","hungry_kd.xy")
y <- substr(x,0, ???)
Upvotes: 1
Views: 91
Reputation: 109844
An extraction approach with stringi
:
library(stringi)
stri_extract_first_regex(x, "[a-z]+(?=[._])")
## [1] "wooombel" "mugran" "friendly" "hungry"
Though "[^a-z]+(?=[._])"
may be more explicit.
[^a-z]+ any character except: 'a' to 'z' (1 or
more times)
(?= look ahead to see if there is:
[._] any character of: '.', '_'
) end of look-ahead
Upvotes: 1
Reputation: 99331
You could also use package stringr
. It has some really handy functions for string manipulation.
One that comes to mind for this problem is word
. It has a sep
argument that allows the use of a regular expression.
> x <- c("wooombel.ab","mugran.cd","friendly_ef.ab","hungry_kd.xy")
> library(stringr)
> word(x, sep = "[._]")
# [1] "wooombel" "mugran" "friendly" "hungry"
Another option that allows you to continue to use substr
is str_locate
. So if we just subtract 1 from its result, we can get the desired first words.
> substr(x, 1, str_locate(x, "[._]")-1)
# [1] "wooombel" "mugran" "friendly" "hungry"
Upvotes: 2
Reputation: 24535
Try:
sapply(strsplit(x,'[._]'), function(x) x[1])
[1] "wooombel" "mugran" "friendly" "hungry"
Upvotes: 2
Reputation: 25726
I think sub
with some regular expressions would be the easiest solution:
sub(pattern = "[._].*", replacement = "", x = x)
# [1] "wooombel" "mugran" "friendly" "hungry"
Upvotes: 4