Reputation: 1115
I have some data in the following format, where multiple sensors have data with timestamps. The is some overlap on timestamps between each sensor, however there are gaps in some of the sensors where no data is present at all. I am looking to subset the data in a way where I only pull timestamps/rows where there is a record for each sensor. Here is a small example:
library(lubridate)
set.seed(88)
start_date = as.POSIXct("2018-08-13 17:30:00 GMT")
datetime = seq(start_date, start_date + minutes(20), by = "5 min")
df <- data.frame(Datetime = datetime, X = runif(15), Sensor =
c(rep(1,5),rep(2,3),rep(3,4),rep(4,3)))
df$Datetime[12] <- df$Datetime[12] + minutes(5)
df
Datetime X Sensor
1 2018-08-13 17:30:00 0.71008793 1
2 2018-08-13 17:35:00 0.23754185 1
3 2018-08-13 17:40:00 0.43144271 1
4 2018-08-13 17:45:00 0.06922129 1
5 2018-08-13 17:50:00 0.78754994 1
6 2018-08-13 17:30:00 0.74219767 2
7 2018-08-13 17:35:00 0.49680693 2
8 2018-08-13 17:40:00 0.28491360 2
9 2018-08-13 17:45:00 0.62937634 3
10 2018-08-13 17:50:00 0.28642922 3
11 2018-08-13 17:30:00 0.44928292 3
12 2018-08-13 17:40:00 0.97428628 3
13 2018-08-13 17:40:00 0.10112843 4
14 2018-08-13 17:45:00 0.89788230 4
15 2018-08-13 17:50:00 0.53922885 4
Desired output here would be:
Datetime X Sensor
3 2018-08-13 17:40:00 0.4314427 1
8 2018-08-13 17:40:00 0.2849136 2
12 2018-08-13 17:40:00 0.9742863 3
13 2018-08-13 17:40:00 0.1011284 4
Upvotes: 3
Views: 47
Reputation: 28825
We can do it in dplyr
by keeping only dates that have entries equal to number of sensors;
df %>% group_by(Datetime) %>% filter(n()==n_distinct(.$Sensor))
#> # A tibble: 4 x 3
#> # Groups: Datetime [1]
#> Datetime X Sensor
#> <dttm> <dbl> <dbl>
#> 1 2018-08-13 17:40:00 0.741 1
#> 2 2018-08-13 17:40:00 0.760 2
#> 3 2018-08-13 17:40:00 0.744 3
#> 4 2018-08-13 17:40:00 0.553 4
Another method in base
would be converting the data from long to wide and keeping only completed cases (the ones with data for all Sensors
) and then convert the results back to long format;
wide <- reshape(df, idvar = "Datetime", timevar = "Sensor", direction = "wide")
reshape(na.omit(wide), direction = "long", varying = list(names(wide)[-1]), v.names = "X",
idvar = "Datetime", timevar = "Sensor", times = unique(df$Sensor)); rm(wide)
#> Datetime Sensor X
#> 2018-08-13 17:40:00.1 2018-08-13 17:40:00 1 0.7410448
#> 2018-08-13 17:40:00.2 2018-08-13 17:40:00 2 0.7602078
#> 2018-08-13 17:40:00.3 2018-08-13 17:40:00 3 0.7441643
#> 2018-08-13 17:40:00.4 2018-08-13 17:40:00 4 0.5529621
Created on 2019-05-31 by the reprex package (v0.3.0)
Upvotes: 2