user9532692
user9532692

Reputation: 706

How to count unique values in a data frame in R

I am building a scheduler that calculates the number of hours each person works for a week. The dataframe looks like this:

>df
  Shift Monday Tuesday Wednesday Thursday Friday Saturday Sunday
1 09-12      a       c         a        c      b        b      b
2 12-15      b       d         b        d      a        a      e
3 15-18      c       e         c        e      d        e      f
4 18-21      d       f         e        f      e        f      a
5 21-24      e       a         d        d      c        d      d
6 24-03      f       b         f        e      a        b      b
7 03-06      a       c         a        a      b        a      e
8 06-09      b       d         b        f      d        e      f

Additionally, I would like to have people who serve Shift 24-03 to have 4 hours instead of 3 hours. So the result would look something like this:

name hours
a       30
b       34
c       32
d       31.5
e       34
f       33

Upvotes: 1

Views: 521

Answers (3)

lebelinoz
lebelinoz

Reputation: 5068

I think gather from the tidyr package will shape the data into a form you want:

> df1 = df %>% tidyr::gather(key = "weekday", value = "name", -Shift)
   Shift   weekday name
1  09-12    Monday    a
2  12-15    Monday    b
3  15-18    Monday    c
4  18-21    Monday    d
...

Then you can add on a hours column using mutate and ifelse:

df2 = df1 %>% mutate(hours = ifelse(Shift == "24-03", 4, 3))

And the answer is a simple group_by / summarise combo:

answer = df2 %>% group_by(name) %>% summarise(hours = sum(hours))

The answer dataframe will look like this:

name hours
-----------
a    34
b    36
c    18
d    30
e    31
f    26

This isn't quite what you wanted, but I think there's something fishy about your sample data. How can someone have 31.5 hours?

Upvotes: 0

Thomas Mailund
Thomas Mailund

Reputation: 1826

If you make sure your columns are strings and not factors, you can unlist the week-day columns to get a plain vector and then use table to count

df <- read.table(text="Shift Monday Tuesday Wednesday Thursday Friday Saturday Sunday
1 09-12      a       c         a        c      b        b      b
2 12-15      b       d         b        d      a        a      e
3 15-18      c       e         c        e      d        e      f
4 18-21      d       f         e        f      e        f      a
5 21-24      e       a         d        d      c        d      d
6 24-03      f       b         f        e      a        b      b
7 03-06      a       c         a        a      b        a      e
8 06-09      b       d         b        f      d        e      f",
                 stringsAsFactors = FALSE)

plain_vec <- unlist(df[,2:8], use.names = FALSE)

This gets you to

> table(plain_vec)
plain_vec
 a  b  c  d  e  f 
11 11  6 10 10  8 

To count hours, you can replicate rows as many times as you have hours adn adjust the 24-03 by adding an extra row for that.

> table(unlist(df[c(rep(c(1:5,7:8), each=3), rep(8,4)), 2:8], use.names=FALSE))

 a  b  c  d  e  f 
30 32 18 38 31 26 

The use.names=FALSE is just something I usually do with unlist. You don't need it, but usually your code is much faster if vectors do not have to carry their names along with them.

Upvotes: 0

akrun
akrun

Reputation: 887118

We can gather into 'long' format, separate the 'shift' into numeric columns, then grouped by 'name', get the difference of the two columns and sum it

library(tidyverse)
gather(df, key, name, -Shift) %>%
     separate(Shift, into = c("Start", "End"), convert = TRUE) %>% 
     mutate(End = ifelse(End < Start, Start + End, End)) %>% 
     group_by(name) %>% 
     summarise(hours = sum(End - Start))

Upvotes: 2

Related Questions