Reputation: 4949
Hi I have a dataframe
d<- data.frame (type=c("rna","rna","rna"), value = c(1,2,3) )
d2 <- data.frame (type=c("dna","dna"), value = c(20,30) )
d3 <- data.frame (type=c("protein","protein","protein"), value = c(-9.6,300, 1000) )
df <- rbind (d,d2,d3)
type value
1 rna 1.0
2 rna 2.0
3 rna 3.0
4 dna 20.0
5 dna 30.0
6 protein -9.6
7 protein 300.0
8 protein 1000.0
What I would like to do is to either use mean or max conditionally. Use max if there is even one value that that is < 0 else use mean. For example in this example the final df should look like this.
value type
1 1000 protein
2 25 dna
3 2 rna
I tried to summarise as such but it errors out.
df %>%
group_by(type) %>%
summarise_all(
funs(
if (. < 0 ){max}
else{mean}
) )
Upvotes: 1
Views: 523
Reputation: 887213
We can wrap it with any
as the . < 0
is a logical vector
of length greater than 1 and if/else
works on a single TRUE/FALSE element. So, wrap with any
to return that single element
df %>%
group_by(type) %>%
summarise_all(funs(if(any(. < 0)) max(.) else mean(.)))
# A tibble: 3 x 2
# type value
# <fct> <dbl>
#1 rna 2
#2 dna 25
#3 protein 1000
If we need to get the mean of only positive numbers
df %>%
group_by(type) %>%
summarise_all(funs(mean(.[.>= 0], na.rm = TRUE)))
NOTE: Here, we assume that in the original dataset, there are more numeric columns to get the mean
of. It is better to add na.rm = TRUE
where ever the parameter is there. If there are NA values in the dataset, it will remove it.
Upvotes: 2
Reputation: 2283
I think a regular summarise
statement is more intuative in this situation.
df %>%
group_by(type) %>%
summarise(value = ifelse(any(value<0),max(value),mean(value)))
# type value
# <fct> <dbl>
#1 rna 2.00
#2 dna 25.0
#3 protein 1000
Upvotes: 1