ike
ike

Reputation: 312

Creating Mean Function for Subset in R

I'm trying to create a function that will take a few parameters and return the total average hourly return. My data set looks like this:

Location    Time    units
1   Columbus    3:35    12
2   Columbus    3:58    199
3   Chicago     6:10    -45
4   Chicago     6:19    87
5   Detroit    12:05    -200
6   Detroit     0:32    11

What I would like returned would be

Location    Time    units   unitsph
Columbus    7:33        211     27.9
Chicago     12:29       42      3.4
Detroit     12:37      -189    -15.1

while also retaining the other items

basically total units produced and units per hour.

I tried out

thing <- time %>% group_by(Location) %>% summarize(sum(units))

which returned locations and total units but not units per hour. Then I moved to

thing <- time %>% group_by(Location) %>% summarize(sum(units)) %>% summarize(sum(Time))

which returned

Error in eval(expr, envir, enclos) : object 'Time' not found

I also tried mutate but to no effect:

fin <- mutate(time, as.numeric(sum(Time))/as.numeric(sum(units)))
Error in Summary.factor(c(118L, 131L, 174L, 178L, 57L), na.rm = FALSE) : 
  ‘sum’ not meaningful for factors

Any help here much appreciated. I also have a few other columns that I'd like to retain (they're geocodes for the locations etc), but didn't list those here. If that's important I can add back in.

Upvotes: 0

Views: 131

Answers (2)

ike
ike

Reputation: 312

I ended up taking part of what @CAFEBABE recommended and modifying it.

I used

mutated_time <- time %>% 
    group_by(Location) %>% 
    summarize(play 
    = sum(as.numeric(Time)/60),
    unitsph = sum(units))

and that plus

selektor <- as.data.frame(select(distinct(mutated_time), Location,unitsph))

got me where I wanted to go. Thank you all for the many helpful comments.

Upvotes: 1

CAFEBABE
CAFEBABE

Reputation: 4101

Your time is a a string object. You can use

data <- data.frame(loc=c("C","C","D","D"),time=c("1:22","1:23","1:24","1:25"),u=c(1,2,3,4))
basetime <- strptime("00:00","%H:%M")
data$in.hours <- as.double(strptime(data$time,"%H:%M")-basetime)
thing <- data %>% group_by(loc) %>% summarize(sum(u),sum(in.hours))

The conversion into hours is not exactly beautiful. It first turns the time into a Posix.ct object to convert it in turn to a double. But guess ok. The converted data

 loc time u in.hours
1   C 1:22 1 1.366667
2   C 1:23 2 1.383333
3   D 1:24 3 1.400000
4   D 1:25 4 1.416667

so 1.366 means 1h + 1/3h. The final result is then

    loc sum(u) sum(in.hours)
  (fctr)  (dbl)         (dbl)
1      C      3      2.750000
2      D      7      2.816667

hence for C you have 2 hours and 0.75*60 minutes

Upvotes: 2

Related Questions