Reputation: 218
I have data where each cell can have multiple values, e.g.:
Gene Pvalue1 Pvalue2 Pvalue3 Beta
Ace 0.0381, ., 0.00357 0.01755, 0.001385 0.0037, NA , 0.039 -0.03,1,15
NOS NA 0.02 0.001, 0.00067 0.00009,25,30
I want to apply min() and max() for each gene's data (I have thousands of genes in total) in each column and get the smallest value for the pvalues but the largest value for columns such as the beta. So the output data would look like this:
Gene Pvalue1 Pvalue2 Pvalue3 Beta
Ace 0.00357 0.001385 0.0037 15
NOS NA 0.02 0.00067 30
I asked a question for this (Select min or max values within one cell (delimited string)) but the best solution has an issue where it takes the negative numeric values and is making them all positive.
min2 = function(x) if(all(is.na(x))) NA else min(x,na.rm = T)
getmin = function(col) str_extract_all(col,"[0-9\\.]+") %>%
lapply(.,function(x)min2(as.numeric(x)) ) %>%
unlist()
df %>%
mutate_at(names(df)[-1],getmin)
How can I adjust this code to make sure negative numbers are still considered negative? I assume it relates to "[0-9\\.]+"
but I can't find any clear resource on what these characters mean in R in this context.
Upvotes: 0
Views: 529
Reputation: 7626
A simple fix would be to allow for a '-' sign immediately preceding the numbers:
min2 <- function(x) if(all(is.na(x))) NA else min(x,na.rm = T)
getmin <- function(col) str_extract_all(col, pattern = "-?[0-9\\.]+") %>%
lapply(.,function(x)min2(as.numeric(x)) ) %>%
unlist()
In the pattern
argument of str_extract_all
:
-?
means 'does or does not contain a minus symbol'[0-9\\.]
means 'a number (0-9) or a full stop'+
means 'occurring one or more times'More details on how all these work can be found here.
Hope that helps!
Upvotes: 1