Reputation: 75
I have the following data.frame:
Engine | MPG | Test_Distance
1. V6 | 17 | 751
2. V4 | 22 | 1850
3. V4-Hybrid| 26 | 210
4. V6-Hybrid| 24 | 85
5. Flat4 | 26 | 4560
6. V6-Hybrid| 28 | 124
7. Flat4 | 17 | 3455
8. V4 | 17 | 1642
Where Engine is a Factor vector, and MPG and Test_Distance are both numeric vectors.
Prior to making more complex stat calculations and plots, I want to simplify the data.frame by sorting:
Note: there are many other columns in this data.frame, but I only put three to simplify the approach.
Here's the resulting data.frame I'd like to have:
Engine_Type | MPG_avg | Test_Distance_total
1. Vx | 18.7 | 4243
2. Vx_Hybrid| 26 | 419
3. Flatx | 14.4 | 8015
4. TOTALS | 19.7 | 12677
I tried using the dplyr
and plyr
packages and following functions: aggregate
, rowSums
, colSums
, data.table
. But to no avail. I thought of creating a temp data.frame, then re-integrate the new values in the original data.frame, but I'm hoping there's a quicker way to do it.
Any suggestion?
Upvotes: 2
Views: 962
Reputation: 887391
We replace the numbers in 'Engine' with 'x' within the group_by
, summarise
to get the mean
and sum
of the 'MPG' and 'Test_Distance' respectively, and bind the rows with the mean
and sum
of the summarised output
library(dplyr)
df1 %>%
group_by(Engine = sub("\\d+", "x", Engine)) %>%
summarise(MPG = mean(MPG), Test_Distance_total = sum(Test_Distance))%>%
bind_rows(tibble(Engine = 'TOTALS',
MPG = mean(.$MPG),
Test_Distance_total = sum(.$Test_Distance_total)))
# A tibble: 4 x 3
# Engine MPG Test_Distance_total
# <chr> <dbl> <int>
#1 Flatx 21.5 8015
#2 Vx 18.7 4243
#3 Vx-Hybrid 26.0 419
#4 TOTALS 22.1 12677
df1 <- structure(list(Engine = c("V6", "V4", "V4-Hybrid", "V6-Hybrid",
"Flat4", "V6-Hybrid", "Flat4", "V4"), MPG = c(17L, 22L, 26L,
24L, 26L, 28L, 17L, 17L), Test_Distance = c(751L, 1850L, 210L,
85L, 4560L, 124L, 3455L, 1642L)), .Names = c("Engine", "MPG",
"Test_Distance"), class = "data.frame", row.names = c("1.", "2.",
"3.", "4.", "5.", "6.", "7.", "8."))
Upvotes: 2