Reputation: 17
Let's say I have a data frame with vectors A:E where vector E looks like this:
ABCDEF50GH
ABCDEF600GH
ABCDEF50GH
ABCDEF1000GH
Part of my code looks like this:
DF <- (filter(DF1, A == "AH") %>%
mutate(B = nchar(E),
C = case_when(D == "X" ~ "0",
B == 10 ~ substr(E, 7, 8),
B == 11 ~ substr(E, 7, 9),
B == 12 ~ substr(E, 7, 10),
TRUE ~ "0")))
So I try to extract a number from a string. The problem is, extracted number is a character not a number so i need to make other arguments of case_when as characters too. Therefore vector C is a character vector and when I try to transform it to numeric:
transform(DF, C = as.numeric(levels(C))[C])
I get a vector with NAs instead of numbers.
Pls help
Upvotes: 0
Views: 375
Reputation: 11480
data: borrowed from JBGruber
sample.df <- data.frame(
E = c(
"ABCDEF50GH",
"ABCDEF600GH",
"ABCDEF50GH",
"ABCDEF1000GH",
"ABCDEF600G400H"
), stringsAsFactors = FALSE)
base solution to extract the last number:
m <- gregexpr("\\d+(?=\\D+$)",text = sample.df$E, perl = T)
sample.df$E_numbers <- as.numeric(regmatches(sample.df$E, m))
result:
# E E_numbers
#1 ABCDEF50GH 50
#2 ABCDEF600GH 600
#3 ABCDEF50GH 50
#4 ABCDEF1000GH 1000
#5 ABCDEF600G400H 400
Upvotes: 0
Reputation: 12420
Using stringr
to extract digits and then simply transform the outcome to a numeric vector:
library(dplyr)
library(stringr)
sample.df <- data.frame(E = c(
"ABCDEF50GH",
"ABCDEF600GH",
"ABCDEF50GH",
"ABCDEF1000GH"
),
stringsAsFactors = FALSE)
sample.df <- sample.df %>%
mutate(E_numbers = str_extract_all(E, "[[:digit:]]+")) %>%
mutate(E_numbers = unlist(E_numbers)) %>%
mutate(E_numbers = as.numeric(E_numbers))
> sample.df
E E_numbers
1 ABCDEF50GH 50
2 ABCDEF600GH 600
3 ABCDEF50GH 50
4 ABCDEF1000GH 1000
str_extract_all()
returns a list which can be tricky to handle, therefore I use unlist()
other than that, it should be straightforward :)
Note: the difference between str_extract_all()
and str_extract()
is that str_extract()
will only catch the first number in your strings. So if one of the strings in E
was "ABCDEF600G400H"
, str_extract_all()
would return the numbers 600
and 400
while str_extract()
would return 600
. Not sure what is preferable in your case.
Edit: If you want to extract only the last number in "ABCDEF600G400H"
we can use the stringi
package instead of stringr
:
library(dplyr)
library(stringi)
sample.df <- data.frame(
E = c(
"ABCDEF50GH",
"ABCDEF600GH",
"ABCDEF50GH",
"ABCDEF1000GH",
"ABCDEF600G400H"
), stringsAsFactors = FALSE)
sample.df <- sample.df %>%
mutate(E_numbers = stri_extract_last_regex(E, "[[:digit:]]+")) %>%
mutate(E_numbers = unlist(E_numbers)) %>%
mutate(E_numbers = as.numeric(E_numbers))
> sample.df
E E_numbers
1 ABCDEF50GH 50
2 ABCDEF600GH 600
3 ABCDEF50GH 50
4 ABCDEF1000GH 1000
5 ABCDEF600G400H 400
Upvotes: 0
Reputation: 1418
you can do it using stringr package
text <- as.data.frame(c("ABCDEF50GH",
"ABCDEF600GH",
"ABCDEF50GH",
"ABCDEF1000GH"))
colnames(text)<-c("names")
library(stringr )
text$numerics <- str_extract(text$names, "[[:digit:]]+")
if you want to convert it to numeric just add as.numeric
text$numerics <- as.numeric(str_extract(text$names, "[[:digit:]]+"))
Upvotes: 1