user4560874
user4560874

Reputation: 9

odd behaviour of group_by function in dplyr in R Studio Version 0.98.1087

I am R novice working on a dataframe 'damageData' in RStudio. Brief summary of the data frame:

>str(damageData)  
'data.frame':    902297 obs. of  9 variables:
  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
  $ PROPDMGEXP: num  1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 ...
  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
  $ CROPDMGEXP: num  0 0 0 0 0 0 0 0 0 0 ...
  $ Property  : num  25000 2500 25000 2500 2500 2500 2500 2500 25000 25000 ...
  $ Crops     : num  0 0 0 0 0 0 0 0 0 0 ...

> head(damageData, 10)
      EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
 1  TORNADO          0       15    25.0       1000       0          0
 2  TORNADO          0        0     2.5       1000       0          0
 3  TORNADO          0        2    25.0       1000       0          0
 4  TORNADO          0        2     2.5       1000       0          0
 5  TORNADO          0        2     2.5       1000       0          0
 6  TORNADO          0        6     2.5       1000       0          0
 7  TORNADO          0        1     2.5       1000       0          0
 8  TORNADO          0        0     2.5       1000       0          0
 9  TORNADO          1       14    25.0       1000       0          0
 10 TORNADO          0        0    25.0       1000       0          0
    Property Crops
 1     25000     0
 2      2500     0
 3     25000     0
 4      2500     0
 5      2500     0
 6      2500     0
 7      2500     0
 8      2500     0
 9     25000     0
 10    25000     0

I want to group the data frame by EVTYPE. When I use the dplyr package and 'group_by(EVTYPE)' followed by summarize(TotalInjuries=sum(INJURIES), TotalFatalities=sum(FATALITIES)), the data frame does not group by EVTYPE. Instead, I get the following result:

TotalInjuries TotalFatalities 1 140528 15145

I tried changing EVTYPE from 'factor' to 'character' and still get the same result. Please help me troubleshoot this oddity!

Upvotes: 0

Views: 561

Answers (1)

akhmed
akhmed

Reputation: 3635

It is hard to say exactly what is going on without a reproducible example. You might be using dplyr syntax incorrectly? See below:

damageData <- data.frame(
  EVTYPE = factor(c("Y","N","Y","N","Y","N","Y","N","Y","N")),
  FATALITIES = c(0,0,0,0,0,0,0,0,1,0),
  INJURIES = c(15,0,2,2,2,6,1,0,14,0))

str(damageData)

library(dplyr)

damageData %>%
  group_by( EVTYPE ) %>%
  summarize( TotalInjuries=sum(INJURIES),
             TotalFatalities=sum(FATALITIES))

and I get the following

Source: local data frame [2 x 3]  

  EVTYPE TotalInjuries TotalFatalities  
1      N             8               0  
2      Y            34               1  

Upvotes: 1

Related Questions