Find number counting days by condition from another column, R

My real data frame have more than 90000 rows and 30 Ptt

And then, I have one data frame like this:

Ptt      Date                      Area        
88734    2016-10-23 05:39:18       BA             
88734    2016-10-23 06:53:13       BA           
88734    2016-11-09 08:32:18       MI            
88734    2016-11-19 06:45:27       MI           
88734    2016-12-20 12:30:43       MI           
88734    2016-12-29 02:45:35       FA             
129041   2017-10-05 04:55:24       BA            
129041   2016-10-23 06:45:30       MI            
129041   2016-11-16 07:10:32       FA            
129041   2016-11-29 03:43:54       FA           
120941   2017-01-02 14:54:39       FA    



dt = data.table(Ptt= c("88734", "88734", "88734", "88734", "88734", "88734", "120941", "120941","120941","120941","120941"),
                date = c("2016-10-23 05:39:18",
                         "2016-10-23 06:53:13 ", 
                         "2016-11-09 08:32:18",
                         "2016-11-19 06:45:27",
                         "2016-12-20 12:30:43",
                         "2016-12-29 02:45:35",
                         "2017-10-05 04:55:24",
                         "2016-10-23 06:45:30",
                         "2016-11-16 07:10:32",
                         "2016-11-29 03:43:54",
                         "2017-01-02 14:54:39"),
                Area = c("BA", "BA", "MI", "MI", "MI", "FA", "BA", "MI", "FA", "FA", "FA"))  

And I wanna two things. First, count how many days each Ptt have at each area. And second, how many days have at each area, but I don't have idea how to do this, someone know how?

About the first one, for example: 88734 have 1 day at BA, 3 days at MI and 1 day at FA, and so on.

Another example (not real): I have 2 repeated days for 88734, 2016-10-23, 2016-10-23, and then 2016-11-09, 2016-11-09, 2016-11-09 (3 repeated days here), and then 2016-12-29 (1 day here). So, this 88734 have at total 3 days counting, not 6 days.

About the second, even have the same date for many Ptt, I need to sum all, I guess is like, sum all days for 88734 at BA, and then sum all days for 129031, and then put all together. (I don't know if is the right reasoning)

For example (not real), 88734 have 10/08/2017, 10/08/2017 at BA, but 129041 have 10/08/2017, 10/08/2017, 10/08/2017 too at BA. So, have 2 days at BA, not 1.

I want this: (could be 2 news data frame)

#First one
Ptt      Area        Days
88734    BA           1
88734    MI           3 
88734    FA           1  
129041   BA           1 
129041   MI           1 
120941   FA           3

#Second one
Area     Days
BA        2
MI        4
FA        4

Thanks!

Upvotes: 0

Views: 462

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388907

Create a column that only has dates in it.

dt$date_col <- as.Date(as.POSIXct(dt$date, format = "%Y-%m-%d %T"))

After that, you can count the number of unique dates for each Ptt & Area and only Area.

You can use dplyr, base R or data.table to do this.

dplyr:

library(dplyr)
dt %>% group_by(Ptt, Area) %>% summarise(Days = n_distinct(date_col))

#  Ptt    Area   Days
#  <chr>  <chr> <int>
#1 120941 BA        1
#2 120941 FA        3
#3 120941 MI        1
#4 88734  BA        1
#5 88734  FA        1
#6 88734  MI        3

and

dt %>% group_by(Area) %>% summarise(Days = n_distinct(date_col))

# Area   Days
#  <chr> <int>
#1 BA        2
#2 FA        4
#3 MI        4

Base R :

aggregate(date_col~Ptt + Area, dt, function(x) length(unique(x)))

and

aggregate(date_col~Area, dt, function(x) length(unique(x)))

data.table :

library(data.table)
setDT(dt)[, .(days = uniqueN(date_col)), .(Ptt, Area)]

and

setDT(dt)[, .(days = uniqueN(date_col)), Area]

Upvotes: 0

Martin Gal
Martin Gal

Reputation: 16978

You could use dplyr and lubridate to get your desired output:

dt %>%
  mutate(date = as_date(date)) %>%
  distinct() %>%
  count(Ptt, Area, name="Days")

returns

      Ptt Area Days
1: 120941   BA 1
2: 120941   FA 3
3: 120941   MI 1
4:  88734   BA 1
5:  88734   FA 1
6:  88734   MI 3

and

dt %>%
  mutate(date = as_date(date)) %>%
  distinct() %>%
  count(Area, name="Days")

gives

   Area Days
1:   BA 2
2:   FA 4
3:   MI 4

Upvotes: 1

Related Questions