Reputation: 706
I am building a scheduler that calculates the number of hours each person works for a week. The dataframe looks like this:
>df
Shift Monday Tuesday Wednesday Thursday Friday Saturday Sunday
1 09-12 a c a c b b b
2 12-15 b d b d a a e
3 15-18 c e c e d e f
4 18-21 d f e f e f a
5 21-24 e a d d c d d
6 24-03 f b f e a b b
7 03-06 a c a a b a e
8 06-09 b d b f d e f
Additionally, I would like to have people who serve Shift 24-03 to have 4 hours instead of 3 hours. So the result would look something like this:
name hours
a 30
b 34
c 32
d 31.5
e 34
f 33
Upvotes: 1
Views: 521
Reputation: 5068
I think gather
from the tidyr
package will shape the data into a form you want:
> df1 = df %>% tidyr::gather(key = "weekday", value = "name", -Shift)
Shift weekday name
1 09-12 Monday a
2 12-15 Monday b
3 15-18 Monday c
4 18-21 Monday d
...
Then you can add on a hours
column using mutate
and ifelse
:
df2 = df1 %>% mutate(hours = ifelse(Shift == "24-03", 4, 3))
And the answer is a simple group_by
/ summarise
combo:
answer = df2 %>% group_by(name) %>% summarise(hours = sum(hours))
The answer
dataframe will look like this:
name hours
-----------
a 34
b 36
c 18
d 30
e 31
f 26
This isn't quite what you wanted, but I think there's something fishy about your sample data. How can someone have 31.5 hours?
Upvotes: 0
Reputation: 1826
If you make sure your columns are strings and not factors, you can unlist the week-day columns to get a plain vector and then use table
to count
df <- read.table(text="Shift Monday Tuesday Wednesday Thursday Friday Saturday Sunday
1 09-12 a c a c b b b
2 12-15 b d b d a a e
3 15-18 c e c e d e f
4 18-21 d f e f e f a
5 21-24 e a d d c d d
6 24-03 f b f e a b b
7 03-06 a c a a b a e
8 06-09 b d b f d e f",
stringsAsFactors = FALSE)
plain_vec <- unlist(df[,2:8], use.names = FALSE)
This gets you to
> table(plain_vec)
plain_vec
a b c d e f
11 11 6 10 10 8
To count hours, you can replicate rows as many times as you have hours adn adjust the 24-03 by adding an extra row for that.
> table(unlist(df[c(rep(c(1:5,7:8), each=3), rep(8,4)), 2:8], use.names=FALSE))
a b c d e f
30 32 18 38 31 26
The use.names=FALSE
is just something I usually do with unlist
. You don't need it, but usually your code is much faster if vectors do not have to carry their names along with them.
Upvotes: 0
Reputation: 887118
We can gather
into 'long' format, separate
the 'shift' into numeric columns, then grouped by 'name', get the difference of the two columns and sum
it
library(tidyverse)
gather(df, key, name, -Shift) %>%
separate(Shift, into = c("Start", "End"), convert = TRUE) %>%
mutate(End = ifelse(End < Start, Start + End, End)) %>%
group_by(name) %>%
summarise(hours = sum(End - Start))
Upvotes: 2