pyll
pyll

Reputation: 1764

Extract last component of string only if numeric in R

I have a dataframe that has multiple . separators. I wish to remove the characters after the last occurrence of . but only if it's numeric. So in the example below, a.b.c will remain intact, but a.b.1 will become two values: a.b and 1. I think I'm close but can't figure out the final piece to pull it together.

have <- data.frame(x = c("a.b", "a.b.c", "a.b.1", "a.b.2", "9.a.b.c"))

want <- data.frame(x = c("a.b", "a.b.c", "a.b", "a.b", "9.a.b.c"),
                   y = c(0, 0, 1, 2, 0))
        
# attempt 1
have %>% mutate(y = sub('.*\\.', '', x))
        
# attempt 2
have %>% separate(x, c('y', 'z'), sep = '.*\\.', extra = 'merge', remove = FALSE)

Upvotes: 2

Views: 163

Answers (4)

akrun
akrun

Reputation: 887108

An option with stringi

library(stringi)
have$y <- as.integer(stri_extract_last_regex(have$x, "\\d+$"))
have$y[is.na(have$y)] <- 0

Upvotes: 2

rpolicastro
rpolicastro

Reputation: 1305

Here's a tidyverse solution with separate

library("tidyr")

have %>%
  separate(x, c("x", "y"), "\\.(?=\\d+$)", fill="right") %>%
  replace_na(list(y=0))

        x y
1     a.b 0
2   a.b.c 0
3     a.b 1
4     a.b 2
5 9.a.b.c 0

Upvotes: 3

Mike V
Mike V

Reputation: 1364

You can try this way

library(tidyverse)
library(stringr)

want2 <- have %>% 
  mutate(y = str_extract(x, "\\d+$")) %>% 
  mutate(y = replace_na(y,0))
#         x y
# 1     a.b 0
# 2   a.b.c 0
# 3   a.b.1 1
# 4   a.b.2 2
# 5 9.a.b.c 0

Upvotes: -1

Duck
Duck

Reputation: 39595

Try this base R approach:

#Data
have <- data.frame(x = c("a.b", "a.b.c", "a.b.1", "a.b.2", "9.a.b.c"),stringsAsFactors = F)
#Index 1
have$y <- as.numeric(sub('.*\\.', '', have$x))
#Index 2
have$x <- ifelse(!is.na(have$y),sub("^(.*)[.].*", "\\1", have$x),have$x)
#Replace NA by zero
have$y[is.na(have$y)]<-0

Output:

        x y
1     a.b 0
2   a.b.c 0
3     a.b 1
4     a.b 2
5 9.a.b.c 0

Upvotes: 2

Related Questions