Reputation: 218

How to separate values in a column and convert to numeric values?

I have a dataset where the values are collapsed so each row has multiple inputs per one column.

For example:

Gene   Score1                      
Gene1  NA, NA, NA, 0.03, -0.3 
Gene2  NA, 0.2, 0.1

I am trying to unpack this to then select the maximum absolute value per row for the Score1 column - and also keep track of if the maximum absolute value was previously negative by creating a new column.

So output of the example is:

Gene   Score1    Negatives1
Gene1   0.3          1
Gene1   0.2          0
#Score1 is now the maximum absolute value and if it used to be negative is tracked

I code this with:

dat2 <- dat %>%
  tidyr::separate_rows(Score1, sep = ",", convert = TRUE) %>%
  group_by(Gene) %>%
  #Create negative column to track max absolute values that were negative
  summarise(Negatives1 = +(min(Score1 == -max(abs(Score1))),
            Score1 = max(abs(Score1), na.rm = TRUE))

However, for some reason the above code gives me this error:

Error: Problem with `summarise()` input `Negatives1`.
x non-numeric argument to mathematical function
i Input `Negatives1` is `+(min(Score1) == -max(abs(Score1)))`.
i The error occurred in group 1: Gene = "Gene1".
Run `rlang::last_error()` to see where the error occurred.

I though by using convert = TRUE this would make the values numeric - but the error suggests the code is getting non-numeric values after I run separate_rows()?

Example input data:

structure(list(Gene = c("Gene1", "Gene2"), Score1 = c("NA, NA, NA, 0.03, -0.3", 
"NA, 0.2, 0.1")), row.names = c(NA, -2L), class = c("data.table", 
"data.frame"))

Upvotes: 0

Answers (3)

Gregor Thomas

Reputation: 146164

If we look at the separate_rows outuput, I think the issue becomes clear: your separated column isn't numeric! I guess convert didn't pick it up. We can force the conversion with as.numeric() (and ignore the warnings - we want things like " NA" to become NA).

You also have some issues in the summarise - need more na.rm = TRUE, mismatched parens, etc.

dat %>%
  tidyr::separate_rows(Score1, sep = ",", convert = TRUE)
# # A tibble: 8 x 2
#   Gene  Score1 
#   <chr> <chr>  
# 1 Gene1  NA    
# 2 Gene1 " NA"  
# 3 Gene1 " NA"  
# 4 Gene1 " 0.03"
# 5 Gene1 " -0.3"
# 6 Gene2  NA    
# 7 Gene2 " 0.2" 
# 8 Gene2 " 0.1" 

dat %>%
  tidyr::separate_rows(Score1, sep = ",", convert = TRUE) %>% 
  mutate(Score1 = as.numeric(Score1)) %>% 
  group_by(Gene) %>%
  #Create negative column to track max absolute values that were negative
  summarise(
    Negatives1 = +(min(Score1, na.rm = TRUE) == -max(abs(Score1), na.rm = TRUE)),
    Score1 = max(abs(Score1), na.rm = TRUE)
  )
# `summarise()` ungrouping output (override with `.groups` argument)
# # A tibble: 2 x 3
#   Gene  Negatives1 Score1
#   <chr>      <int>  <dbl>
# 1 Gene1          1    0.3
# 2 Gene2          0    0.2

Upvotes: 4

rkabuk

Reputation: 139

Error: Problem with `summarise()` input `Negatives1`.
x non-numeric argument to mathematical function
i Input `Negatives1` is `+(min(Score1) == -max(abs(Score1)))`.
i The error occurred in group 1: Gene = "Gene1".
Run `rlang::last_error()` to see where the error occurred.

Well this tells you are fiting non numeric arguments into mathematical function which max()

Fast check i did dat2[dat2$Gene == "Gene1",] gave me an answer that some of your data is stored as text due to separation

  Gene  Score1
  <chr> <chr>  
1 Gene1  NA    
2 Gene1 " NA" 
3 Gene1 " NA" 
4 Gene1 " 0.03"
5 Gene1 " -0.3"

Simply modify to numeric :)

Upvotes: 0

Wimpel

Reputation: 27792

here is a data.table approach

library( matrixStats )
library( data.table)
#split strings
l <- data.table::tstrsplit( DT$Score1, ", " )l
#create value columns
DT[, paste0( "val_", 1:length( l ) ) := lapply( l, as.numeric ) ]
#funs max and negatives in the value columns
DT[, `:=`( Score1    = rowMaxs( as.matrix(.SD), na.rm = TRUE ),
           negatives = rowSums( .SD < 0, na.rm = TRUE ) ), 
   .SDcols = patterns("^val_")]
#get relevant columns
DT[, .(Gene, Score1, negatives) ]
# Gene Score1 negatives
# 1: Gene1   0.03         1
# 2: Gene2   0.20         0

Upvotes: 0

How to separate values in a column and convert to numeric values?

Answers (3)

Related Questions