D. Studer
D. Studer

Reputation: 1875

R: explode a character-string and get the last element (row-wise)

I have the following data-frame

df <- data.frame(var1 = c("f253.02.ds.a01", "f253.02.ds.a02", "f253.02.ds.x.a01", "f253.02.ds.x.a02", "f253.02.ds.a10", "test"))
df

What's the easiest way to extract the last two digits of the variable var1? (e.g. 1, 2, 10, NA) I was experimenting with separate(), but the number of points in the characters is not always the same. Maybe with regular expressions?

Upvotes: 1

Views: 275

Answers (2)

akrun
akrun

Reputation: 887203

With separate, we can use a regex lookaround

library(dplyr)
library(tidyr)
df %>% 
  separate(var1, into = c('prefix', 'suffix'),
      sep="(?<=[a-z])(?=\\d+$)", remove = FALSE, convert = TRUE)

-output

#              var1         prefix suffix
#1   f253.02.ds.a01   f253.02.ds.a      1
#2   f253.02.ds.a02   f253.02.ds.a      2
#3 f253.02.ds.x.a01 f253.02.ds.x.a      1
#4 f253.02.ds.x.a02 f253.02.ds.x.a      2
#5   f253.02.ds.a10   f253.02.ds.a     10
#6             test           test     NA

Upvotes: 1

G. Grothendieck
G. Grothendieck

Reputation: 269664

The expected output shown in the question has 4 elements but the input has 6 rows so we assume that the expected output shown in the question is erroneous and that the correct output is that shown below. tes).

Now assuming that the 2 digits are preceded by a non-digit and note that \D means non-digit (backslash must be doubled within double quo

df %>% mutate(last2 = as.numeric(sub(".*\\D", "", var1)))

giving:

              var1 last2
1   f253.02.ds.a01     1
2   f253.02.ds.a02     2
3 f253.02.ds.x.a01     1
4 f253.02.ds.x.a02     2
5   f253.02.ds.a10    10
6             test    NA

Upvotes: 1

Related Questions