Reputation: 515
My real data frame have more than 90000 rows and 30 Ptt
And then, I have one data frame like this:
Ptt Date Area
88734 2016-10-23 05:39:18 BA
88734 2016-10-23 06:53:13 BA
88734 2016-11-09 08:32:18 MI
88734 2016-11-19 06:45:27 MI
88734 2016-12-20 12:30:43 MI
88734 2016-12-29 02:45:35 FA
129041 2017-10-05 04:55:24 BA
129041 2016-10-23 06:45:30 MI
129041 2016-11-16 07:10:32 FA
129041 2016-11-29 03:43:54 FA
120941 2017-01-02 14:54:39 FA
dt = data.table(Ptt= c("88734", "88734", "88734", "88734", "88734", "88734", "120941", "120941","120941","120941","120941"),
date = c("2016-10-23 05:39:18",
"2016-10-23 06:53:13 ",
"2016-11-09 08:32:18",
"2016-11-19 06:45:27",
"2016-12-20 12:30:43",
"2016-12-29 02:45:35",
"2017-10-05 04:55:24",
"2016-10-23 06:45:30",
"2016-11-16 07:10:32",
"2016-11-29 03:43:54",
"2017-01-02 14:54:39"),
Area = c("BA", "BA", "MI", "MI", "MI", "FA", "BA", "MI", "FA", "FA", "FA"))
And I wanna two things. First, count how many days each Ptt
have at each area
. And second, how many days have at each area
, but I don't have idea how to do this, someone know how?
About the first one, for example: 88734
have 1 day at BA
, 3 days at MI
and 1 day at FA
, and so on.
Another example (not real): I have 2 repeated days for 88734
, 2016-10-23
, 2016-10-23
, and then 2016-11-09
, 2016-11-09
, 2016-11-09
(3 repeated days here), and then 2016-12-29
(1 day here). So, this 88734
have at total 3 days counting, not 6 days.
About the second, even have the same date
for many Ptt
, I need to sum all, I guess is like, sum all days for 88734
at BA
, and then sum all days for 129031
, and then put all together. (I don't know if is the right reasoning)
For example (not real), 88734
have 10/08/2017, 10/08/2017 at BA
, but 129041
have 10/08/2017, 10/08/2017, 10/08/2017 too at BA
. So, have 2 days at BA
, not 1.
I want this: (could be 2 news data frame)
#First one
Ptt Area Days
88734 BA 1
88734 MI 3
88734 FA 1
129041 BA 1
129041 MI 1
120941 FA 3
#Second one
Area Days
BA 2
MI 4
FA 4
Thanks!
Upvotes: 0
Views: 462
Reputation: 388907
Create a column that only has dates in it.
dt$date_col <- as.Date(as.POSIXct(dt$date, format = "%Y-%m-%d %T"))
After that, you can count the number of unique dates for each Ptt
& Area
and only Area
.
You can use dplyr
, base R or data.table
to do this.
dplyr:
library(dplyr)
dt %>% group_by(Ptt, Area) %>% summarise(Days = n_distinct(date_col))
# Ptt Area Days
# <chr> <chr> <int>
#1 120941 BA 1
#2 120941 FA 3
#3 120941 MI 1
#4 88734 BA 1
#5 88734 FA 1
#6 88734 MI 3
and
dt %>% group_by(Area) %>% summarise(Days = n_distinct(date_col))
# Area Days
# <chr> <int>
#1 BA 2
#2 FA 4
#3 MI 4
Base R :
aggregate(date_col~Ptt + Area, dt, function(x) length(unique(x)))
and
aggregate(date_col~Area, dt, function(x) length(unique(x)))
data.table :
library(data.table)
setDT(dt)[, .(days = uniqueN(date_col)), .(Ptt, Area)]
and
setDT(dt)[, .(days = uniqueN(date_col)), Area]
Upvotes: 0
Reputation: 16978
You could use dplyr
and lubridate
to get your desired output:
dt %>%
mutate(date = as_date(date)) %>%
distinct() %>%
count(Ptt, Area, name="Days")
returns
Ptt Area Days
1: 120941 BA 1
2: 120941 FA 3
3: 120941 MI 1
4: 88734 BA 1
5: 88734 FA 1
6: 88734 MI 3
and
dt %>%
mutate(date = as_date(date)) %>%
distinct() %>%
count(Area, name="Days")
gives
Area Days
1: BA 2
2: FA 4
3: MI 4
Upvotes: 1