HJ Liang
HJ Liang

Reputation: 23

How to merge/stack observations by date in R

I have a data frame like

date             X1 X2 X3
4/16/2019 0:00   1  2  3
4/16/2019 7:00   1  2  3
4/172019 0:00    1  2  3
4/17/2019 7:00   1  2  3

I would like to get

date        time     X1      X2      X3
4/16/2019   c(0,7)   c(1,1)  c(2,2)  c(3,3)
4/17/2019   c(0,7)   c(1,1)  c(2,2)  c(3,3)

where X1 is a list and X1[[1]] is a vector, that is c(1,1).

Is there an efficient way to achieve this? Thank you!

Upvotes: 2

Views: 110

Answers (2)

TarJae
TarJae

Reputation: 78917

Here is an alternative way how you could do it: Logic:

  1. separate date and time column (other then with separate, as already provided by akrun)
  2. group
  3. summarise with across using list and lambda paste (notice the .names argument in summarise
  4. use again across and lambda paste0
library(dplyr)
library(readr)
df %>% 
  mutate(date = mdy_hm(date)) %>% 
  mutate(time = parse_number(sprintf("%02d", hour(date))), .before=2,
         date = as.Date(ymd_hms(date))) %>% 
  group_by(date) %>% 
  summarise(across(everything(), list(~paste(.,collapse=",")), .names="{col}")) %>% 
  mutate(across(-date, ~paste0("c(",.,")")))
  date       time   X1     X2     X3    
  <date>     <chr>  <chr>  <chr>  <chr> 
1 2019-04-16 c(0,7) c(1,1) c(2,2) c(3,3)
2 2019-04-17 c(0,7) c(1,1) c(2,2) c(3,3)

Upvotes: 1

akrun
akrun

Reputation: 886948

Split the 'date' into 'date', 'time' columns at the space (\\s+), grouped by 'date', then summarise across all the columns by wrapping them in a list

library(dplyr)
library(tidyr)
library(stringr)
df1 %>%   
   separate(date, into = c('date', 'time'), sep = '\\s+') %>%
   mutate(time = as.numeric(str_replace(time, ":", ".")) %>%
   group_by(date) %>%
   summarise(across(everything(), ~ list(.)))

-output

# A tibble: 2 × 5
  date      time      X1        X2        X3       
  <chr>     <list>    <list>    <list>    <list>   
1 4/16/2019 <dbl [2]> <int [2]> <int [2]> <int [2]>
2 4/17/2019 <dbl [2]> <int [2]> <int [2]> <int [2]>

data

df1 <- structure(list(date = c("4/16/2019 0:00", "4/16/2019 7:00", 
"4/17/2019 0:00", 
"4/17/2019 7:00"), X1 = c(1L, 1L, 1L, 1L), X2 = c(2L, 2L, 2L, 
2L), X3 = c(3L, 3L, 3L, 3L)), 
class = "data.frame", row.names = c(NA, 
-4L))

Upvotes: 3

Related Questions