Reputation: 1396
So, the problem for this question is, I cannot post actual code because of an agreement I had to sign and I'm new at R and probably unable to explain that well, , but maybe someone can help me anyway...
Let's say I have some data:
A B C D
F1 6.6 10 10
F1 3.1 10 10
A1 1.0 20 10
B1 3.4 20 20
So, for every A, the C and D values are the same. But I want to use dplyr to find Bmean like so:
A Bmean C D
F1 4,85 10 10
A1 1.0 20 10
B1 3.4 20 20
How would I do that? My idea was to use something like
dplyr::group_by(A) %>% dplyr::summarize(Bmean = mean(B))
but C and D seem to disappear after this operation. Would it make sense to group_by all columns I want to keep? Or how would that work?
Just to clarify, I would like to use the dplyr syntax, since it's part of a bigger operation, if possible.
Upvotes: 6
Views: 15243
Reputation: 1807
With dplyr
newer than version 1.1.0, a new function called reframe()
was introduced, though still labeled experimental.
Answering this question, reframe()
could be helpful as the following code snippet:
tibble::tribble(
~A, ~B, ~C, ~D,
"F1", 6.6, 10, 10,
"F1", 3.1, 10, 10,
"A1", 1.0, 20, 10,
"B1", 3.4, 20, 20,
) |> dplyr::group_by(A) |>
dplyr::reframe(
Bmean = mean(B),
dplyr::across(C:D, .fns = ~unique(.x)),
)
Upvotes: 0
Reputation: 142
I would like to add an awnser which specifically solves the problem with the use of dplyr. While I'm sure, there are more elegant ways of doing this, the following proposal can retain columns with additional descriptive variables in a summarized/aggregated data frame. Also if this is not the case the code will not work protecting you from mistakes in bigger dataframes.
library(dplyr)
library(tibble)
df <- tribble(
~A , ~B , ~c , ~D ,
"F1", 6.6, 10 , 10 ,
"F1", 3.1, 10 , 10 ,
"A1", 1.0, 20 , 10 ,
"B1", 3.4, 20 , 20
)
The following code drops the columns C and D
df %>%
group_by(A) %>%
summarise(Bmean = mean(B))
This code keeps the columns C and D. Note that this only works, if there is the same variable in each row of the group. But since the variables should be retained and not have an influence in thr grouping behaviour this should be the case anyways.
df %>%
group_by(A) %>%
summarise(Bmean = mean(B),
C = unique(C),
D = unique(D))
in fact you can also include the groups in the group_by expression, if the grouping levels are not "smaller" than the grouped variable
Group1:
A , B , C , D
"F1", 6.6, 10 , 10
"F1", 3.1, 10 , 10
Group2:
"A1", 1.0, 20 , 10
Group3:
"B1", 3.4, 20 , 20
Note that columns C and D maintain the same value within each group. This means they could safely be used in the grouping expression and thus be retained.
So in your case also this would work:
group_by(A,C,D)
Upvotes: 6
Reputation: 121
OK, this question is a bit old but in case someone still ends up here like me, here's a solution with the same package (dplyr). Works especially well if you have many columns to keep unchanged.
df %>%
group_by(A) %>%
summarise(Bmean = mean(B),
across(C:D, .fns = ~.x))
It's applying a function (that does nothing) to columns C to D (note: I have uppercase 'C' here instead of the lowercase 'c' in the original post)
Upvotes: 5
Reputation: 886938
We can use data.table
library(data.table)
setDT(df1)[,.(Bmean = mean(B)) , .(A, C, D)]
# A C D Bmean
#1: F1 10 10 4.85
#2: A1 20 10 1.00
#3: B1 20 20 3.40
Upvotes: 3
Reputation: 3587
You can do this using base R
aggregate(data=df1,B~.,FUN = mean)
Upvotes: 4
Reputation: 824
You can simply use something like this:
aggregate(cbind(B, C, D) ~ A, df, mean)
Here df is your data frame with columns A, B, C and D.
Upvotes: 3