ifoxfoot
ifoxfoot

Reputation: 223

How do I get the first character of the last word in a string in R?

So I have a list of names, and I want to extract the first character of the last word in the name. I can get the last word, but not the first character of the last word.

species <- c("ACHILLEA MILLEFOLIUM VAR. BOREALIS", 
             "ACHILLEA MILLEFOLIUM VAR. MILLEFOLIUM", 
             "ALLIUM SCHOENOPRASUM VAR. SIBIRICUM")

#can get the last word
str_extract(data$species, "\\w+$")
[1] "BOREALIS"    "MILLEFOLIUM" "SIBIRICUM"

What I want is [1] "B" "M" "S"

Upvotes: 1

Views: 159

Answers (3)

The fourth bird
The fourth bird

Reputation: 163362

With str_extract you could also assert a whitespace boundary to the left and match the first following word characters, while asserting optional word characters to the end of the string.

If you want to match any non whitespace character you can also use \\S instead of \\w

library (stringr)

species <- c("ACHILLEA MILLEFOLIUM VAR. BOREALIS", 
             "ACHILLEA MILLEFOLIUM VAR. MILLEFOLIUM", 
             "ALLIUM SCHOENOPRASUM VAR. SIBIRICUM")

str_extract(species, "(?<!\\S)\\w(?=\\w*$)")

Output

[1] "B" "M" "S"

See an R demo.

Upvotes: 1

Bensstats
Bensstats

Reputation: 1056

This might not be the most elegant solution, but you can always pipe string_extract() a second time to get the first character of the last word.


library(stringr)
species <- c("ACHILLEA MILLEFOLIUM VAR. BOREALIS", 
             "ACHILLEA MILLEFOLIUM VAR. MILLEFOLIUM", 
             "ALLIUM SCHOENOPRASUM VAR. SIBIRICUM")

str_extract(species, "(\\w+$)") |> 
  str_extract("^[A-Z]")

[1] "B" "M" "S"

Upvotes: 2

akrun
akrun

Reputation: 887128

We may capture the non-whitespace character (\\S) followed by one or more non-whitespace charactrers (\\S+) till the end ($) of the string and replace by the backreference (\\1) of the captured group

sub(".*\\s+(\\S)\\S+$", "\\1", species)
[1] "B" "M" "S"

Upvotes: 2

Related Questions