Tiptop
Tiptop

Reputation: 623

Sort data with tidyverse based on criteria

I have a dataframe consisting of two columns: ID and date_time. ID indicates who is being recorded and date_time indicates when. See a piece of the dataframe below.

From this dataframe I would like to calculate a new dataframe that has three columns: "Measurement", "ID" and "Date".

  1. “Measurement”. This column should tell what number measurement this was of a given ID. A measurement starts at 23:00:00 and then runs until 22:59:59 the next day. A measurement however starts at random times and the duration of the first measurement is thus not 24 hours. Neither is the last measurement 24 hours.
  2. “ID”. Indicate the ID of a given measurement
  3. “Date”. This column should show the date of the last recording in a given measurement in this format: yyyy.mm.dd.

There should be a row for every unique measurement

structure(list(date_time = c("2020.03.02 22:00:17", "2020.03.02 22:05:17", 
"2020.03.02 22:10:17", "2020.03.02 22:35:17", "2020.03.02 22:40:17", 
"2020.03.02 22:45:17", "2020.03.02 22:50:17", "2020.03.02 22:55:17", 
"2020.03.02 23:00:17", "2020.03.02 23:05:17", "2020.03.02 23:10:17", 
"2020.03.02 23:15:17", "2020.03.02 23:20:17", "2020.03.02 23:25:17", 
"2020.03.02 23:30:17", "2020.03.02 23:35:17", "2020.03.02 23:40:17", 
"2020.03.02 23:45:17", "2020.03.02 23:50:17", "2020.03.02 23:55:17", 
"2020.03.03 00:00:17", "2020.03.03 00:55:17", "2020.03.03 01:00:17", 
"2020.03.03 01:05:17", "2020.03.03 01:10:17", "2020.03.03 01:15:17", 
"2020.03.03 01:20:17", "2020.03.03 01:25:17", "2020.05.09 08:39:32", 
"2020.05.09 08:39:32", "2020.05.09 08:39:32", "2020.05.09 08:39:32", 
"2020.05.09 08:39:32", "2020.05.09 08:39:32", "2020.05.09 08:39:32", 
"2020.05.09 08:39:32", "2020.05.09 08:39:32", "2020.05.09 08:39:32", 
"2020.05.09 08:39:32", "2020.05.09 08:39:32", "2020.05.09 08:39:32", 
"2020.05.09 08:39:32", "2020.05.09 08:39:32", "2020.05.09 08:39:32", 
"2020.05.09 08:39:32", "2020.05.09 08:39:32"), id = c(12L, 12L, 
12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 
12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 
13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 
13L, 13L, 13L, 13L, 13L)), row.names = c(NA, 46L), class = "data.frame")
#Expected output:
output <- read.table(header=TRUE, text ="
                     ID Date        Measurement
                     12 2020.03.02  1
                     12 2020.03.03  2
                     13 2020.05.09  1
")  

I'm new to R and try to work with tidyverse. All help is much appreciated, thanks!

Upvotes: 0

Views: 443

Answers (2)

YBS
YBS

Reputation: 21349

Assuming df1 is your dataframe, another way to do is...

df1$dateTime = as_datetime(df1$date_time, format = "%Y.%m.%d %H:%M:%S")
df1$mydate = as.Date(df1$date_time, format = "%Y.%m.%d %H:%M:%S")

df1$tm <- as.numeric(df1$dateTime)
df1$dts <- 86400*as.numeric(df1$mydate)

df2 <- df1 %>% group_by(id,mydate) %>% 
         transform(date = case_when(((dts-3600)<tm & tm<(dts+82800) )~paste0(mydate),((dts+82800)<=tm)~paste0(mydate+1) )) %>% 
         select(id,date) %>%   unique() %>% 
         group_by(id) %>% mutate(measurement = row_number())
df2

>df2
# A tibble: 3 x 3
# Groups:   id [2]
     id date       measurement
  <int> <chr>            <int>
1    12 2020-03-02           1
2    12 2020-03-03           2
3    13 2020-05-09           1

Upvotes: 1

NColl
NColl

Reputation: 757

I'm sure there's a better way of doing this but.....

library(tidyverse)

df <- data.frame(
  structure(list(date_time = c("2020.03.02 22:00:17", "2020.03.02 22:05:17", 
                               "2020.03.02 22:10:17", "2020.03.02 22:35:17", "2020.03.02 22:40:17", 
                               "2020.03.02 22:45:17", "2020.03.02 22:50:17", "2020.03.02 22:55:17", 
                               "2020.03.02 23:00:17", "2020.03.02 23:05:17", "2020.03.02 23:10:17", 
                               "2020.03.02 23:15:17", "2020.03.02 23:20:17", "2020.03.02 23:25:17", 
                               "2020.03.02 23:30:17", "2020.03.02 23:35:17", "2020.03.02 23:40:17", 
                               "2020.03.02 23:45:17", "2020.03.02 23:50:17", "2020.03.02 23:55:17", 
                               "2020.03.03 00:00:17", "2020.03.03 00:55:17", "2020.03.03 01:00:17", 
                               "2020.03.03 01:05:17", "2020.03.03 01:10:17", "2020.03.03 01:15:17", 
                               "2020.03.03 01:20:17", "2020.03.03 01:25:17", "2020.05.09 08:39:32", 
                               "2020.05.09 08:39:32", "2020.05.09 08:39:32", "2020.05.09 08:39:32", 
                               "2020.05.09 08:39:32", "2020.05.09 08:39:32", "2020.05.09 08:39:32", 
                               "2020.05.09 08:39:32", "2020.05.09 08:39:32", "2020.05.09 08:39:32", 
                               "2020.05.09 08:39:32", "2020.05.09 08:39:32", "2020.05.09 08:39:32", 
                               "2020.05.09 08:39:32", "2020.05.09 08:39:32", "2020.05.09 08:39:32", 
                               "2020.05.09 08:39:32", "2020.05.09 08:39:32"), id = c(12L, 12L, 
                                                                                     12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 
                                                                                     12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 
                                                                                     13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 
                                                                                     13L, 13L, 13L, 13L, 13L)), row.names = c(NA, 46L), class = "data.frame")
)

df %>%
  mutate(
    date_time = anytime::anydate(date_time)
  ) %>%
  group_by(id) %>% summarise(date = unique(date_time)) %>%
  mutate(
    Measurement = 1,
    Measurement = cumsum(Measurement)
  )

# A tibble: 3 x 3
# Groups:   id [2]
     id date       Measurement
  <int> <date>           <dbl>
1    12 2020-03-02           1
2    12 2020-03-03           2
3    13 2020-05-09           1

Upvotes: 1

Related Questions