Sebastian Zeki
Sebastian Zeki

Reputation: 6874

How to subtract sequential numbers in a list

I have some text in a dataframe as follows

Input

rownumber  CStage
1           38-40cm
2           27-22
3           32cm and 40cm

I want to subtract the two numbers in each CStage with the output being

Desired output

rownumber  CStage
1           2
2           5
3           8

I have used stringr::str_extract_all(df$CStage,"\\d{2}")

which gives me a list with each element containing two numbers

[[1]]
[1] "38" "40"

[[2]]
[1] "27" "22"

[[3]]
[1] "32" "40"

How can I then subtract the two numbers (to get a positive output)

Upvotes: 0

Views: 102

Answers (3)

G. Grothendieck
G. Grothendieck

Reputation: 269481

1) strapply This can be done compactly using strapply in gsubfn. Define a regular expression such that for each element of CStage it extracts the two numbers in the capture groups passing them to the anonymous function defined in formula notation returning the absolute value of the difference.

library(gsubfn)

transform(DF, CStage = strapply(CStage, 
                                "(\\d+)\\D+(\\d+)", 
                                ~ abs(as.numeric(x) - as.numeric(y)),
                                simplify = TRUE))

giving:

  rownumber CStage
1         1      2
2         2      5
3         3      8

2) Base R A base R solution can be obtained by replacing the non-digits with spaces in CStage and them reading it using read.table to create a data frame having V1 and V2 columns. Subtract those columns and take the absolute value.

transform(DF, CStage = with(read.table(text = gsub("\\D", " ", CStage)), abs(V1-V2)))

giving:

  rownumber CStage
1         1      2
2         2      5
3         3      8

3) dplyr/tidyr A solution using dplyr and tidyr using a similar approach to (2) is:

library(dplyr)
library(tidyr)

DF %>%
  separate(CStage, into = c("V1", "V2"), sep = "\\D+", 
    extra = "drop", convert = TRUE) %>%
  mutate(CStage = abs(V1 - V2)) %>%
  select(rownumber, CStage)

giving:

  rownumber CStage
1         1      2
2         2      5
3         3      8

Note

The input in reproducible form is:

Lines <- "
rownumber,CStage
1,38-40cm
2,27-22
3,32cm and 40cm"

DF <- read.csv(text = Lines, as.is = TRUE)

Upvotes: 1

Andre Elrico
Andre Elrico

Reputation: 11480

You can also sort and then use diff.

sapply(regmatches(df1$CStage, gregexpr("\\d+", df1$CStage)), function(x)diff(sort(as.numeric(x))))
#[1] 2 5 8

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388862

As @Cath mentioned in the comments you could use sapply, convert it into numeric and take difference between them.

num_list <- stringr::str_extract_all(df$CStage,"\\d{2}")
abs(sapply(num_list, function(x) diff(as.numeric(x))))
#[1] 2 5 8

Upvotes: 3

Related Questions