sakwa
sakwa

Reputation: 65

Extract character from string based on character in another vector in R

I'm totally new to the community and hope my question and example meet the criteria.

I've got a dataframe with two character vectors. The values in vector a vary in length, the values in vector b all consist of exactly one character.

a <- as.character(c("tsm", "skr", "fl", "pfl", "ts", "St", "S"))
b <- as.character(c("m", "k", "l", "l", "s", "t", "S"))
uedf <- data.frame(a, b)

I want to extract the character in a directly to the left of a character that is specified in vector b. The position of that character within the string can vary. So, from the first string, I want to extract "s" (left of m), in the second again "s" (left of k) and so on.

As I couldn't figure out how to do this using grepl() (I'm not very familiar with regex), I finally ended up with a combination of strsplit() and str_sub().

str_sub(strsplit(uedf$a,split=uedf$b, fixed=FALSE), start = -1, end = -1)

This works well for most cases except the second where it returns ")" instead of the desired "s".

[1] "s" ")" "f" "f" "t" "S" "" 

Any ideas why this might be and how I could solve the problem? Thanks in advance!

Upvotes: 2

Views: 903

Answers (3)

Brian Davis
Brian Davis

Reputation: 992

Here I locate positions that match your index and save them in i. Then extract the characters one less then i.

i <- mapply(regexpr, b, a) - 1
substr(a, i, i)
[1] "s" "s" "f" "f" "t" "S" "" 

Upvotes: 2

Frostic
Frostic

Reputation: 680

I think str_sub only works with strings but for the second string strsplit gives you a vector of 2 strings.

This would do the job in the case the separator only appears once in every string:

sapply(strsplit(a,split=b, fixed=FALSE), function(l) str_sub(l[[1]],-1,-1))

Upvotes: 2

Maurits Evers
Maurits Evers

Reputation: 50668

Here is a solution using base R's gsub:

sapply(1:length(a), function(i) ifelse(
    nchar(a[i]) > 1,
    gsub(paste0("^.*(\\w)", b[i], ".*$"), "\\1", a[i]),
    ""))
#[1] "s" "s" "f" "f" "t" "S" ""

Or even more concise and cleaner/neater using mapply (thanks to @thelatemail):

mapply(function(a,b) ifelse(
    nchar(a) > 1, 
    gsub(paste0("^.*(\\w)", b, ".*$"), "\\1", a), 
    ""), a, b)

Upvotes: 2

Related Questions