Reputation: 13
I have a dataframe which looks like this:
data <- data.frame(id=c(1,2,6,3,7,1,5,7),
class=c('apple','boy','boy','apple','boy','apple','apple','boy'),
type=c('type1','type1','type2','type2','type3','type4','type4','type4'),
col1=c(-0.9,0.8,0.7,-0.6,-0.5,0.4,0.3,0.9), col2=c(-6.9,2.8,0.4,-1.6,-0.8,0.6,0.2,-0.1),
col3=c(6.7,0.9,0.2,-0.7,-0.8,1.6,3.2,0.1))
id class type col1 col2 col3
1 apple type1 -0.9 -6.9 6.7
2 boy type1 0.8 2.8 0.9
6 boy type2 0.7 0.4 0.2
3 apple type2 -0.6 -1.6 -0.7
7 boy type3 -0.5 -0.8 -0.8
1 apple type4 0.4 0.6 1.6
5 apple type4 0.3 0.2 3.2
7 boy type4 0.9 -0.1 0.1
I am trying to create a dataframe which has the same columns (i.e., col1, col2, col3, ...) but the values in it should be median((data %>% filter(class=="apple"))$col1) - median((data %>% filter(class=="boy"))$col1)
and so on for each type
for each column.
So, the final dataframe will look like
type col1 col2 col3
type1 -0.1 -4.1 3.7
type2 0.7 0.4 0.2
type3 -0.5 -0.8 -0.8
type4 0.4 0.6 1.6
I can do this by creating individual dataframes of each type
and calculating the difference of medians of both the classes and append the vector to an empty dataframe with bind_rows()
.
But is there any better and easier method to do this?
Upvotes: 1
Views: 76
Reputation: 160407
The method you want is something like this:
data %>%
group_by(type) %>%
summarize(across(col1:col3, ~ median(.[class=="boy"] - median(.[class=="boy"]))))
# # A tibble: 4 x 4
# type col1 col2 col3
# <chr> <dbl> <dbl> <dbl>
# 1 type1 0 0 0
# 2 type2 0 0 0
# 3 type3 0 0 0
# 4 type4 0 0 0
though in this instance it will return all 0
s because you only have one "boy"
within each group.
Post question-edit, here's the updated code and results:
data %>%
group_by(type) %>%
summarize(across(col1:col3, ~ median(.[class=="apple"]) - median(.[class=="boy"])))
# # A tibble: 4 x 4
# type col1 col2 col3
# <chr> <dbl> <dbl> <dbl>
# 1 type1 -1.7 -9.700 5.8
# 2 type2 -1.3000 -2 -0.9000
# 3 type3 NA NA NA
# 4 type4 -0.55 0.5 2.3
The NA
s are because type3
only has "boy"
, no "apple"
.
(At least we aren't comparing "apple"
to "orange"
, that would have been rather cliché ;-)
Upvotes: 2
Reputation: 78917
Here is a way how you can get your solution: That was a tough!
library(dplyr)
data %>%
arrange(id) %>%
filter(class == "boy" | type=="type1") %>%
group_by(type) %>%
summarise(across(starts_with("col"), sum))
type col1 col2 col3
<chr> <dbl> <dbl> <dbl>
1 type1 -0.1 -4.1 7.6
2 type2 0.7 0.4 0.2
3 type3 -0.5 -0.8 -0.8
4 type4 0.9 -0.1 0.1
Upvotes: 0