DN1
DN1

Reputation: 218

How to get min/max value from strings which account for negative numbers in R?

I have data where each cell can have multiple values, e.g.:

Gene       Pvalue1             Pvalue2              Pvalue3                  Beta
Ace    0.0381, ., 0.00357    0.01755, 0.001385    0.0037, NA , 0.039         -0.03,1,15
NOS          NA                  0.02              0.001, 0.00067              0.00009,25,30

I want to apply min() and max() for each gene's data (I have thousands of genes in total) in each column and get the smallest value for the pvalues but the largest value for columns such as the beta. So the output data would look like this:

Gene       Pvalue1             Pvalue2              Pvalue3                  Beta
Ace        0.00357              0.001385             0.0037                   15
NOS          NA                  0.02                0.00067                  30

I asked a question for this (Select min or max values within one cell (delimited string)) but the best solution has an issue where it takes the negative numeric values and is making them all positive.

min2 = function(x) if(all(is.na(x))) NA else min(x,na.rm = T)
getmin = function(col) str_extract_all(col,"[0-9\\.]+") %>%
  lapply(.,function(x)min2(as.numeric(x)) ) %>%
  unlist() 

df %>%
    mutate_at(names(df)[-1],getmin)

How can I adjust this code to make sure negative numbers are still considered negative? I assume it relates to "[0-9\\.]+" but I can't find any clear resource on what these characters mean in R in this context.

Upvotes: 0

Views: 529

Answers (1)

Andy Baxter
Andy Baxter

Reputation: 7626

A simple fix would be to allow for a '-' sign immediately preceding the numbers:

min2 <- function(x) if(all(is.na(x))) NA else min(x,na.rm = T)
getmin <- function(col) str_extract_all(col, pattern = "-?[0-9\\.]+") %>%
  lapply(.,function(x)min2(as.numeric(x)) ) %>%
  unlist()

In the pattern argument of str_extract_all:

  • -? means 'does or does not contain a minus symbol'
  • [0-9\\.] means 'a number (0-9) or a full stop'
  • + means 'occurring one or more times'

More details on how all these work can be found here.

Hope that helps!

Upvotes: 1

Related Questions