Reputation: 1764
I have a dataframe that has multiple .
separators. I wish to remove the characters after the last occurrence of .
but only if it's numeric. So in the example below, a.b.c
will remain intact, but a.b.1
will become two values: a.b
and 1
. I think I'm close but can't figure out the final piece to pull it together.
have <- data.frame(x = c("a.b", "a.b.c", "a.b.1", "a.b.2", "9.a.b.c"))
want <- data.frame(x = c("a.b", "a.b.c", "a.b", "a.b", "9.a.b.c"),
y = c(0, 0, 1, 2, 0))
# attempt 1
have %>% mutate(y = sub('.*\\.', '', x))
# attempt 2
have %>% separate(x, c('y', 'z'), sep = '.*\\.', extra = 'merge', remove = FALSE)
Upvotes: 2
Views: 163
Reputation: 887108
An option with stringi
library(stringi)
have$y <- as.integer(stri_extract_last_regex(have$x, "\\d+$"))
have$y[is.na(have$y)] <- 0
Upvotes: 2
Reputation: 1305
Here's a tidyverse solution with separate
library("tidyr")
have %>%
separate(x, c("x", "y"), "\\.(?=\\d+$)", fill="right") %>%
replace_na(list(y=0))
x y
1 a.b 0
2 a.b.c 0
3 a.b 1
4 a.b 2
5 9.a.b.c 0
Upvotes: 3
Reputation: 1364
You can try this way
library(tidyverse)
library(stringr)
want2 <- have %>%
mutate(y = str_extract(x, "\\d+$")) %>%
mutate(y = replace_na(y,0))
# x y
# 1 a.b 0
# 2 a.b.c 0
# 3 a.b.1 1
# 4 a.b.2 2
# 5 9.a.b.c 0
Upvotes: -1
Reputation: 39595
Try this base R
approach:
#Data
have <- data.frame(x = c("a.b", "a.b.c", "a.b.1", "a.b.2", "9.a.b.c"),stringsAsFactors = F)
#Index 1
have$y <- as.numeric(sub('.*\\.', '', have$x))
#Index 2
have$x <- ifelse(!is.na(have$y),sub("^(.*)[.].*", "\\1", have$x),have$x)
#Replace NA by zero
have$y[is.na(have$y)]<-0
Output:
x y
1 a.b 0
2 a.b.c 0
3 a.b 1
4 a.b 2
5 9.a.b.c 0
Upvotes: 2