Oriol Prat
Oriol Prat

Reputation: 1047

Mean of time - hh:mm:ss - group by a variable

Need to calculate the mean of Time by Country. Time is a Date variable - hh:mm:ss.

This command with(df,tapply(as.numeric(times(df$Time)),Country,mean)) is not returning the correct mean in hh:mm:ss.

    Country Time
1   Germany 2:26:21
2   Germany 2:19:19
3   Brazil  2:06:34
4   USA     2:06:17
5   Eth     2:18:58
6   Japan   2:08:35
7   Morocco 2:05:27
8   Germany 2:13:57
9   Romania 2:21:30
10  Spain   2:07:23

Output:

>with(df,tapply(as.numeric(times(df$Time)),Country,mean))
      Andorra     Australia        Brazil        Canada         China 
   0.09334491    0.09634259    0.09578125    0.09634645    0.09481192 
      Eritrea      Ethiopia        France       Germany Great Britain 
   0.09709491    0.09010031    0.10025463    0.09713349    0.09524306 
      Ireland         Italy         Japan         Kenya       Morocco 
   0.09593750    0.09520255    0.09579630    0.08934854    0.09400463 
   New Zeland          Peru        Poland       Romania        Russia 
   0.09664931    0.09809606    0.09638889    0.09875000    0.09327932 
        Spain   Switzerland        Uganda United States      Zimbabwe 
   0.09314236    0.09620949    0.10068287    0.09399016    0.09892940 

Upvotes: 3

Views: 902

Answers (2)

jlhoward
jlhoward

Reputation: 59345

I see you've discovered the agony of working with date and time values in R...

Is this what you had in mind?

df$nTime <- difftime(strptime(df$Time,"%H:%M:%S"),
                     strptime("00:00:00","%H:%M:%S"),
                     units="secs")
df.means <- aggregate(df$nTime,by=list(df$Country),mean)
df.means$Time <- format(.POSIXct(df.means$x,tz="GMT"), "%H:%M:%S")
df.means
  Group.1         x     Time
# 1  Brazil 7594.000  02:06:34
# 2     Eth 8338.000  02:18:58
# 3 Germany 8392.333  02:19:52
# 4   Japan 7715.000  02:08:35
# 5 Morocco 7527.000  02:05:27
# 6 Romania 8490.000  02:21:30
# 7   Spain 7643.000  02:07:23
# 8     USA 7577.000  02:06:17

The first line adds a column nTime which is the time, in seconds, since midnight. The second line calculates the means. The third line converts back to H:M:S.

The problem you were having is the strptime(...), when forced to convert to numeric, returns the number of second between 1970-01-01 and the indicated time today. So, a really big number. This code just subtracts out the number of second from 1970-01-01 and 00:00:00 today.

Upvotes: 2

TheComeOnMan
TheComeOnMan

Reputation: 12875

Are you trying to do this -

dades$Time <- strptime(dades$Time,'%H:%M:%S')
by(dades$Time, dades$Country, mean)

If I didn't understand your question, can you please post sample output.

Upvotes: 1

Related Questions