Sort data with tidyverse based on criteria

Question

I have a dataframe consisting of two columns: ID and date_time. ID indicates who is being recorded and date_time indicates when. See a piece of the dataframe below.

From this dataframe I would like to calculate a new dataframe that has three columns: "Measurement", "ID" and "Date".

“Measurement”. This column should tell what number measurement this was of a given ID. A measurement starts at 23:00:00 and then runs until 22:59:59 the next day. A measurement however starts at random times and the duration of the first measurement is thus not 24 hours. Neither is the last measurement 24 hours.
“ID”. Indicate the ID of a given measurement
“Date”. This column should show the date of the last recording in a given measurement in this format: yyyy.mm.dd.

There should be a row for every unique measurement

structure(list(date_time = c("2020.03.02 22:00:17", "2020.03.02 22:05:17", 
"2020.03.02 22:10:17", "2020.03.02 22:35:17", "2020.03.02 22:40:17", 
"2020.03.02 22:45:17", "2020.03.02 22:50:17", "2020.03.02 22:55:17", 
"2020.03.02 23:00:17", "2020.03.02 23:05:17", "2020.03.02 23:10:17", 
"2020.03.02 23:15:17", "2020.03.02 23:20:17", "2020.03.02 23:25:17", 
"2020.03.02 23:30:17", "2020.03.02 23:35:17", "2020.03.02 23:40:17", 
"2020.03.02 23:45:17", "2020.03.02 23:50:17", "2020.03.02 23:55:17", 
"2020.03.03 00:00:17", "2020.03.03 00:55:17", "2020.03.03 01:00:17", 
"2020.03.03 01:05:17", "2020.03.03 01:10:17", "2020.03.03 01:15:17", 
"2020.03.03 01:20:17", "2020.03.03 01:25:17", "2020.05.09 08:39:32", 
"2020.05.09 08:39:32", "2020.05.09 08:39:32", "2020.05.09 08:39:32", 
"2020.05.09 08:39:32", "2020.05.09 08:39:32", "2020.05.09 08:39:32", 
"2020.05.09 08:39:32", "2020.05.09 08:39:32", "2020.05.09 08:39:32", 
"2020.05.09 08:39:32", "2020.05.09 08:39:32", "2020.05.09 08:39:32", 
"2020.05.09 08:39:32", "2020.05.09 08:39:32", "2020.05.09 08:39:32", 
"2020.05.09 08:39:32", "2020.05.09 08:39:32"), id = c(12L, 12L, 
12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 
12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 
13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 
13L, 13L, 13L, 13L, 13L)), row.names = c(NA, 46L), class = "data.frame")
#Expected output:
output <- read.table(header=TRUE, text ="
                     ID Date        Measurement
                     12 2020.03.02  1
                     12 2020.03.03  2
                     13 2020.05.09  1
")

I'm new to R and try to work with tidyverse. All help is much appreciated, thanks!

YBS · Accepted Answer

Assuming df1 is your dataframe, another way to do is...

df1$dateTime = as_datetime(df1$date_time, format = "%Y.%m.%d %H:%M:%S")
df1$mydate = as.Date(df1$date_time, format = "%Y.%m.%d %H:%M:%S")

df1$tm <- as.numeric(df1$dateTime)
df1$dts <- 86400*as.numeric(df1$mydate)

df2 <- df1 %>% group_by(id,mydate) %>% 
         transform(date = case_when(((dts-3600)% 
         select(id,date) %>%   unique() %>% 
         group_by(id) %>% mutate(measurement = row_number())
df2

>df2
# A tibble: 3 x 3
# Groups:   id [2]
     id date       measurement
               
1    12 2020-03-02           1
2    12 2020-03-03           2
3    13 2020-05-09           1

Sort data with tidyverse based on criteria

Answers (2)

Related Questions