Reputation: 3456
Phrased differently, how do you aggregate one column (e.g. the column
) while keeping another the same (e.g. the location
)?
This MWE illustrates my problem. How would I add back in the location
column after doing a summarise()
? Is there some solution that involves "going up a level," before the summarise()
so that I can maintain the original columns?
test <- as.data.table(data.frame(event_id = c("A","B","A","A","B"),
income = c(1,2,3,4,5),
location = c("PlaceX","PlaceY","PlaceX","PlaceX","PlaceY")))
test
event_id income location
1: A 1 PlaceX
2: B 2 PlaceY
3: A 3 PlaceX
4: A 4 PlaceX
5: B 5 PlaceY
test %>%
group_by(event_id) %>%
summarise(mean_inc = mean(income))
Source: local data table [2 x 2]
event_id mean_inc
(fctr) (dbl)
1 A 2.666667
2 B 3.500000
The following doesn't work:
test %>%
group_by(event_id) %>%
summarise(mean_inc = mean(income),
location = location)
Source: local data table [5 x 3]
event_id mean_inc location
(fctr) (dbl) (fctr)
1 A 2.666667 PlaceX
2 A 2.666667 PlaceX
3 A 2.666667 PlaceX
4 B 3.500000 PlaceY
5 B 3.500000 PlaceY
My desired output is:
Source: local data table [2 x 3]
event_id location mean_inc
(fctr) (fctr) (dbl)
1 A PlaceX 2.666667
2 B PlaceY 3.500000
Upvotes: 1
Views: 690
Reputation: 36076
An option might be to use mutate
and then pull out just one value per group via distinct
.
The usefulness of this would depend on the real use case: this seems most useful if your new variable has the same name as the original variable it summarizes. Otherwise you would end up having your original, unsummarized variable in the final dataset.
distinct
works here because the object is still grouped.
test %>%
group_by(event_id) %>%
mutate(income = mean(income)) %>%
distinct()
Source: local data table [2 x 3]
event_id income location
(fctr) (dbl) (fctr)
1 A 2.666667 PlaceX
2 B 3.500000 PlaceY
In dplyr_0.4.3.9000 you would need .keep_all = TRUE
in distinct
.
Upvotes: 1
Reputation: 263342
I hope I have understood your desires. Do an inner_join
to recover the missing columns (assuming they are matched 1-1 with the group_by
argument):
newtest <- test %>%
group_by(event_id) %>%
summarise(mean_inc = mean(income)) %>% inner_join(test[-(1:2)])
#Joining by: "event_id"
newtest
#-----------------
Source: local data table [3 x 4]
event_id mean_inc income location
(fctr) (dbl) (dbl) (fctr)
1 A 2.666667 3 PlaceX
2 A 2.666667 4 PlaceX
3 B 3.500000 5 PlaceY
It's also possible you wanted to match on both event_id and location:
test %>%
group_by(event_id,location) %>%
summarise(mean_inc = mean(income))
#---------
#Source: local data table [2 x 3]
#Groups: event_id
event_id location mean_inc
(fctr) (fctr) (dbl)
1 A PlaceX 2.666667
2 B PlaceY 3.500000
Upvotes: 1