Reputation: 42343
Computing medians seems to be a bit of an achilles heel for R (ie. no data.frame method). What is the least amount of typing needed to get group medians from a data frame using dplyr?
my_data <- structure(list(group = c("Group 1", "Group 1", "Group 1", "Group 1",
"Group 1", "Group 1", "Group 1", "Group 1", "Group 1", "Group 1",
"Group 1", "Group 1", "Group 1", "Group 1", "Group 1", "Group 2",
"Group 2", "Group 2", "Group 2", "Group 2", "Group 2", "Group 2",
"Group 2", "Group 2", "Group 2", "Group 2", "Group 2", "Group 2",
"Group 2", "Group 2"), value = c("5", "3", "6", "8", "10", "13",
"1", "4", "18", "4", "7", "9", "14", "15", "17", "7", "3", "9",
"10", "33", "15", "18", "6", "20", "30", NA, NA, NA, NA, NA)), .Names = c("group",
"value"), class = c("tbl_df", "data.frame"), row.names = c(NA,
-30L))
library(dplyr)
# groups 1 & 2
my_data_groups_1_and_2 <- my_data[my_data$group %in% c("Group 1", "Group 2"), ]
# compute medians per group
medians <- my_data_groups_1_and_2 %>%
group_by(group) %>%
summarize(the_medians = median(value, na.rm = TRUE))
Which gives:
Error in summarise_impl(.data, dots) :
STRING_ELT() can only be applied to a 'character vector', not a 'double'
In addition: Warning message:
In mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]) :
argument is not numeric or logical: returning NA
What is the least effort workaround here?
Upvotes: 3
Views: 7596
Reputation: 70336
As commented by ivyleavedtoadflax, the error is caused by supplying a non-numeric or non-logical argument to median
, since your value
column is of type character
(you can easily tell that they are not numeric
by seeing that the numbers are quoted). Here are two simple ways to solve it:
my_data %>%
filter(group %in% c("Group 1", "Group 2")) %>%
group_by(group) %>%
summarize(the_medians = median(as.numeric(value), na.rm = TRUE))
Or
my_data %>%
filter(group %in% c("Group 1", "Group 2")) %>%
mutate(value = as.numeric(value)) %>%
group_by(group) %>%
summarize(the_medians = median(value, na.rm = TRUE))
To check the structure including type
of columns in your data, you could conveniently use
str(my_data)
#Classes ‘tbl_df’ and 'data.frame': 30 obs. of 2 variables:
# $ group: chr "Group 1" "Group 1" "Group 1" "Group 1" ...
# $ value: chr "5" "3" "6" "8" ...
Upvotes: 4