R. Zh
R. Zh

Reputation: 51

Join/Merge data frames on inconsistent time stamps with long and wide data

I have two data frames,

df3 has a time stamp with date and time, user id, and heart rate with multiple observations (long data)

df4 has a time stamp with date only, user id, calories and sleep (wide data)

I would like to combine them so that the data frame with long data can be filled in with the data frame with wide data based on the date and user id

Here is the code for a toy data set with a similar layout

df3 <- data.frame(  time_stamp=c('2016-11-01 10:29:41','2016-11-01 10:53:11','2016-11-02 01:07:54','2016-11-02 02:00:40','2016-11-02 04:01:33','2016-11-02 05:23:53','2016-11-02 13:20:17'),
               users_user_id=c(7,7,7,7,7,7,7),
               avg_heart_rate=c(94,90,88,85,91,89,95))
df4 <- data.frame(  time_stamp=c('2016-11-01','2016-11-02'), users_user_id=c(7,7), calories=c(1800,2000), sleep=c(480,560))
df3$time_stamp <- as.POSIXct(df3$time_stamp)
df4$time_stamp <- as.POSIXct(df4$time_stamp)

I tried splitting the time from the date, but then when I did a full_join on dplyr using the time stamp and userID I was left with a lot of NAs. I tried looking up how to use reshape2 to melt my data? but I'm lost as to how it could help me...

Upvotes: 1

Views: 52

Answers (2)

Phil
Phil

Reputation: 8107

A tidyverse way:

library(tidyr)
library(dplyr)

df3 <- separate(df3, time_stamp, into = c("date_stamp", "time_stamp"), sep = " ")
df3$date_stamp <- as.POSIXct(df3$date_stamp)

left_join(df3, df4, by = c("date_stamp" = "time_stamp", "users_user_id"))


  date_stamp time_stamp users_user_id avg_heart_rate calories sleep
  1 2016-11-01   10:29:41             7             94     1800   480
  2 2016-11-01   10:53:11             7             90     1800   480
  3 2016-11-02   01:07:54             7             88     2000   560
  4 2016-11-02   02:00:40             7             85     2000   560
  5 2016-11-02   04:01:33             7             91     2000   560
  6 2016-11-02   05:23:53             7             89     2000   560
  7 2016-11-02   13:20:17             7             95     2000   560

Upvotes: 1

Chase
Chase

Reputation: 69181

You could create a new column that just contains the date information and merge on that:

df3$date <- as.Date(df3$time_stamp)
df4$date <- as.Date(df4$time_stamp)
merge(df3, df4, by = c("date", "users_user_id"))

Gives you:

        date users_user_id        time_stamp.x avg_heart_rate time_stamp.y calories sleep
1 2016-11-01             7 2016-11-01 10:29:41             94   2016-11-01     1800   480
2 2016-11-01             7 2016-11-01 10:53:11             90   2016-11-01     1800   480
3 2016-11-02             7 2016-11-02 01:07:54             88   2016-11-02     2000   560
4 2016-11-02             7 2016-11-02 02:00:40             85   2016-11-02     2000   560
5 2016-11-02             7 2016-11-02 04:01:33             91   2016-11-02     2000   560
6 2016-11-02             7 2016-11-02 05:23:53             89   2016-11-02     2000   560
7 2016-11-02             7 2016-11-02 13:20:17             95   2016-11-02     2000   560

Upvotes: 0

Related Questions