Reputation: 6874
I have some text in a dataframe as follows
Input
rownumber CStage
1 38-40cm
2 27-22
3 32cm and 40cm
I want to subtract the two numbers in each CStage
with the output being
Desired output
rownumber CStage
1 2
2 5
3 8
I have used stringr::str_extract_all(df$CStage,"\\d{2}")
which gives me a list with each element containing two numbers
[[1]]
[1] "38" "40"
[[2]]
[1] "27" "22"
[[3]]
[1] "32" "40"
How can I then subtract the two numbers (to get a positive output)
Upvotes: 0
Views: 102
Reputation: 269481
1) strapply This can be done compactly using strapply
in gsubfn. Define a regular expression such that for each element of CStage
it extracts the two numbers in the capture groups passing them to the anonymous function defined in formula notation returning the absolute value of the difference.
library(gsubfn)
transform(DF, CStage = strapply(CStage,
"(\\d+)\\D+(\\d+)",
~ abs(as.numeric(x) - as.numeric(y)),
simplify = TRUE))
giving:
rownumber CStage
1 1 2
2 2 5
3 3 8
2) Base R A base R solution can be obtained by replacing the non-digits with spaces in CStage
and them reading it using read.table
to create a data frame having V1 and V2 columns. Subtract those columns and take the absolute value.
transform(DF, CStage = with(read.table(text = gsub("\\D", " ", CStage)), abs(V1-V2)))
giving:
rownumber CStage
1 1 2
2 2 5
3 3 8
3) dplyr/tidyr A solution using dplyr and tidyr using a similar approach to (2) is:
library(dplyr)
library(tidyr)
DF %>%
separate(CStage, into = c("V1", "V2"), sep = "\\D+",
extra = "drop", convert = TRUE) %>%
mutate(CStage = abs(V1 - V2)) %>%
select(rownumber, CStage)
giving:
rownumber CStage
1 1 2
2 2 5
3 3 8
The input in reproducible form is:
Lines <- "
rownumber,CStage
1,38-40cm
2,27-22
3,32cm and 40cm"
DF <- read.csv(text = Lines, as.is = TRUE)
Upvotes: 1
Reputation: 11480
You can also sort
and then use diff.
sapply(regmatches(df1$CStage, gregexpr("\\d+", df1$CStage)), function(x)diff(sort(as.numeric(x))))
#[1] 2 5 8
Upvotes: 1
Reputation: 388862
As @Cath mentioned in the comments you could use sapply
, convert it into numeric and take diff
erence between them.
num_list <- stringr::str_extract_all(df$CStage,"\\d{2}")
abs(sapply(num_list, function(x) diff(as.numeric(x))))
#[1] 2 5 8
Upvotes: 3