Reputation: 218
I have a dataset where the values are collapsed so each row has multiple inputs per one column.
For example:
Gene Score1
Gene1 NA, NA, NA, 0.03, -0.3
Gene2 NA, 0.2, 0.1
I am trying to unpack this to then select the maximum absolute value per row for the Score1
column - and also keep track of if the maximum absolute value was previously negative by creating a new column.
So output of the example is:
Gene Score1 Negatives1
Gene1 0.3 1
Gene1 0.2 0
#Score1 is now the maximum absolute value and if it used to be negative is tracked
I code this with:
dat2 <- dat %>%
tidyr::separate_rows(Score1, sep = ",", convert = TRUE) %>%
group_by(Gene) %>%
#Create negative column to track max absolute values that were negative
summarise(Negatives1 = +(min(Score1 == -max(abs(Score1))),
Score1 = max(abs(Score1), na.rm = TRUE))
However, for some reason the above code gives me this error:
Error: Problem with `summarise()` input `Negatives1`.
x non-numeric argument to mathematical function
i Input `Negatives1` is `+(min(Score1) == -max(abs(Score1)))`.
i The error occurred in group 1: Gene = "Gene1".
Run `rlang::last_error()` to see where the error occurred.
I though by using convert = TRUE
this would make the values numeric - but the error suggests the code is getting non-numeric values after I run separate_rows()
?
Example input data:
structure(list(Gene = c("Gene1", "Gene2"), Score1 = c("NA, NA, NA, 0.03, -0.3",
"NA, 0.2, 0.1")), row.names = c(NA, -2L), class = c("data.table",
"data.frame"))
Upvotes: 0
Views: 185
Reputation: 145755
If we look at the separate_rows
outuput, I think the issue becomes clear: your separated column isn't numeric! I guess convert
didn't pick it up. We can force the conversion with as.numeric()
(and ignore the warnings - we want things like " NA"
to become NA
).
You also have some issues in the summarise
- need more na.rm = TRUE
, mismatched parens, etc.
dat %>%
tidyr::separate_rows(Score1, sep = ",", convert = TRUE)
# # A tibble: 8 x 2
# Gene Score1
# <chr> <chr>
# 1 Gene1 NA
# 2 Gene1 " NA"
# 3 Gene1 " NA"
# 4 Gene1 " 0.03"
# 5 Gene1 " -0.3"
# 6 Gene2 NA
# 7 Gene2 " 0.2"
# 8 Gene2 " 0.1"
dat %>%
tidyr::separate_rows(Score1, sep = ",", convert = TRUE) %>%
mutate(Score1 = as.numeric(Score1)) %>%
group_by(Gene) %>%
#Create negative column to track max absolute values that were negative
summarise(
Negatives1 = +(min(Score1, na.rm = TRUE) == -max(abs(Score1), na.rm = TRUE)),
Score1 = max(abs(Score1), na.rm = TRUE)
)
# `summarise()` ungrouping output (override with `.groups` argument)
# # A tibble: 2 x 3
# Gene Negatives1 Score1
# <chr> <int> <dbl>
# 1 Gene1 1 0.3
# 2 Gene2 0 0.2
Upvotes: 4
Reputation: 139
Error: Problem with `summarise()` input `Negatives1`.
x non-numeric argument to mathematical function
i Input `Negatives1` is `+(min(Score1) == -max(abs(Score1)))`.
i The error occurred in group 1: Gene = "Gene1".
Run `rlang::last_error()` to see where the error occurred.
Well this tells you are fiting non numeric arguments into mathematical function which max()
Fast check i did dat2[dat2$Gene == "Gene1",]
gave me an answer that some of your data is stored as text due to separation
Gene Score1
<chr> <chr>
1 Gene1 NA
2 Gene1 " NA"
3 Gene1 " NA"
4 Gene1 " 0.03"
5 Gene1 " -0.3"
Simply modify to numeric :)
Upvotes: 0
Reputation: 27732
here is a data.table
approach
library( matrixStats )
library( data.table)
#split strings
l <- data.table::tstrsplit( DT$Score1, ", " )l
#create value columns
DT[, paste0( "val_", 1:length( l ) ) := lapply( l, as.numeric ) ]
#funs max and negatives in the value columns
DT[, `:=`( Score1 = rowMaxs( as.matrix(.SD), na.rm = TRUE ),
negatives = rowSums( .SD < 0, na.rm = TRUE ) ),
.SDcols = patterns("^val_")]
#get relevant columns
DT[, .(Gene, Score1, negatives) ]
# Gene Score1 negatives
# 1: Gene1 0.03 1
# 2: Gene2 0.20 0
Upvotes: 0