David
David

Reputation: 95

Summarising by a group variable in r

I have a data frame as follows:

 head(newStormObject)
     FATALITIES   INJURIES    PROPVALDMG CROPVALDMG      EVTYPE     total
 1           0          15    2.5e+05          0        TORNADO        15
 2           0           0    2.5e+04          0        TORNADO         0
 3           0           3    2.5e+07          0        TORNADO         3 
 4           0           3    2.5e+07          0        TORNADO         3
 5           0           0    0.0e+00          0      TSTM WIND         1
 6           0           0    0.0e+00          0           HAIL         2
 7           0           0    0.0e+00          0           HAIL         3
 8           0           0    0.0e+00          0      TSTM WIND         0
 9           0           0    0.0e+00          0           HAIL         0
10           0           0    0.0e+00          0      TSTM WIND         0
11           0           0    0.0e+00          0      TSTM WIND         0
12           0           0    0.0e+00          0           HAIL         1
13           0           0    0.0e+00          0           HAIL         1
14           0           0    0.0e+00          0           HAIL         5
15           0           0    0.0e+00          0      TSTM WIND         0

What I am attempting to do is group by the event type (EVTYPE) and sum the totals column accordingly so printing the data frame would look as follows:

       FATALITIES   INJURIES  PROPVALDMG CROPVALDMG      EVTYPE     total
 1           0          15    2.5e+05          0        TORNADO       21
 2           0           0    0.0e+00          0           HAIL       11
 3           0           0    0.0e+00          0      TSTM WIND        0

To try to do this, I wrote the following

newStormObject %>% group_by(EVTYPE, total) %>% summarise(EVTYPE, sum(total))

but I got an error saying 'Error: cannot modify grouping variable'.

The first two statements in the 'pipe statement' appear to work fine but just gives the output according to the first block, so the error seems to come from the 'summarise' statement.

Any suggestion to solve this would be appreciated.

Upvotes: 0

Views: 133

Answers (2)

lmo
lmo

Reputation: 38520

Here is a base R solution that returns the same values (in a slightly different order)

merge(df[!duplicated(df$EVTYPE), -length(df)],
         aggregate(total ~ EVTYPE, data=df, sum), by="EVTYPE")
     EVTYPE FATALITIES INJURIES PROPVALDMG CROPVALDMG total
1      HAIL          0        0          0          0    12
2   TORNADO          0       15     250000          0    21
3 TSTM_WIND          0        0          0          0     1

duplicated is used to select the first observation of each EVTYPE level, aggregate is used to calculate the sum of the total variable. These results are merged on EVTYPE.

The rows are ordered by the order that factor automatically stores factor variables, that is alphabetically. The columns are slightly disordered from the desired output due to merge which puts the by variables in the front of the resulting data set. Fixing the columns is a matter of passing the names of the original data.frame.

merge(df[!duplicated(df$EVTYPE), -length(df)],
      aggregate(total ~ EVTYPE, data=df, sum), by="EVTYPE")[, names(df)]
  FATALITIES INJURIES PROPVALDMG CROPVALDMG    EVTYPE total
1          0        0          0          0      HAIL    12
2          0       15     250000          0   TORNADO    21
3          0        0          0          0 TSTM_WIND     1

Upvotes: 0

akrun
akrun

Reputation: 887731

We can take the first value for all the other columns using slice after updating the 'total' with the sum of 'total'.

library(dplyr)
df1 %>% 
   group_by(EVTYPE) %>% 
   mutate(total = sum(total)) %>%
   slice(1L) %>%
   arrange(desc(total))
#      FATALITIES INJURIES PROPVALDMG CROPVALDMG    EVTYPE total
#       <int>    <int>      <dbl>      <int>     <chr> <int>
#1          0       15     250000          0   TORNADO    21
#2          0        0          0          0      HAIL    12
#3          0        0          0          0 TSTM WIND     1

NOTE: The 'total' for 'EVTYPE' "HAIL" is 12 based on the example

Upvotes: 1

Related Questions