Reputation: 1133
I have a dataframe with columns for datetime, id, time and depth. I am using ddply to get the mean time and depth for each unique datetime, as there are duplicate datetime rows. However, within each datetime block, there may be more than one unique 'id’ value, and again duplicate rows for individual ids. Therefore, what I need to do for each datetime block is actually calculate the mean of all the unique times taken from within the id blocks. i.e. I first need to get the unique time values from within each id block, then I want to calculate the mean for all of the unique time values returned using this method for each datetime block. I am trying to use %>% to do this, but this is new syntax for me and I am struggling. Any help, or alternative suggestions within the ddply wrapper for datetime would be appreciated. I provide an example below.
> dput(df3)
structure(list(datetime = c("23/03/2017 14:13:45", "23/03/2017 14:13:45",
"23/03/2017 14:13:45", "23/03/2017 14:13:45", "23/03/2017 14:13:45",
"23/03/2017 14:13:45", "23/03/2017 14:13:45", "23/03/2017 14:13:45",
"23/03/2017 14:13:45", "23/03/2017 14:13:45", "23/03/2017 14:15:15",
"23/03/2017 14:15:15", "23/03/2017 14:15:15", "23/03/2017 14:15:15",
"23/03/2017 14:15:45", "23/03/2017 14:15:45", "23/03/2017 14:16:15",
"23/03/2017 14:16:15", "23/03/2017 14:16:15", "23/03/2017 14:16:15",
"23/03/2017 14:16:15", "23/03/2017 14:16:15", "23/03/2017 14:16:15"
), id = c(11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 12L,
12L, 12L, 13L, 14L, 14L, 15L, 16L, 16L, 16L, 17L, 18L, 18L),
time = c(10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
3L, 3L, 3L, 1L, 2L, 2L, 1L, 3L, 3L, 3L, 1L, 2L, 2L), dep = c(0.448675132,
0.448675132, 0.448675132, 0.448675132, 0.448675132, 0.448675132,
0.448675132, 0.448675132, 0.448675132, 0.448675132, 0.285520539,
0.285520539, 0.285520539, 0.285520539, 0.316112025, 0.316112025,
0.326309187, 0.356900674, 0.356900674, 0.356900674, 0.38749216,
0.326309187, 0.326309187)), class = "data.frame", row.names = c(NA,
-23L))
My attempt which doesn’t work:
#convert datetime to POSIXct
df3$datetime = as.POSIXct(strptime(df3$datetime, format="%d/%m/%Y %H:%M:%S"), tz="UTC")
#Now condense the dateframe by unique datetime summarising tim and dep cols
dfCondensed = ddply(df3, .(datetime), summarise,
#get the mean time for each unique datetime, but calculate this using
#all the unique time values found within each unique id
meantime = group_by(id) %>% unique(time) %>% mean(),
#do the same as above but for dep
meandep = group_by(id) %>% unique(dep) %>% mean())
> dfCondensed
datetime tim dep
1 23/03/2017 14:13:45 10.00 0.4486751
2 23/03/2017 14:15:15 2.00 0.2855205
3 23/03/2017 14:15:45 2.00 0.3161120
4 23/03/2017 14:16:15 1.75 0.3492528
Upvotes: 1
Views: 312
Reputation: 887951
We can use base R
df4 <- unique(df3)
by(df4[c('time', 'dep')], df4[c('datetime')], FUN = colMeans)
Or with aggregate
from base R
aggregate(cbind(time, dep) ~ datetime, df4, mean)
# datetime time dep
#1 23/03/2017 14:13:45 10.00 0.4486751
#2 23/03/2017 14:15:15 2.00 0.2855205
#3 23/03/2017 14:15:45 2.00 0.3161120
#4 23/03/2017 14:16:15 1.75 0.3492528
Upvotes: 0
Reputation: 389325
I think you are looking for :
library(dplyr)
df3 %>%
distinct() %>%
group_by(datetime) %>%
summarise(dep = mean(dep), mean = mean(time))
# datetime dep mean
# <chr> <dbl> <dbl>
#1 23/03/2017 14:13:45 0.449 10
#2 23/03/2017 14:15:15 0.286 2
#3 23/03/2017 14:15:45 0.316 2
#4 23/03/2017 14:16:15 0.349 1.75
Upvotes: 2
Reputation: 27792
here is a data.table
approach
library(data.table)
setDT(df3)
unique(df3, by = c("datetime", "id"))[, .(mean.time = mean(time),
mean.dep = mean(dep)),
by = .(datetime)][]
datetime mean.time mean.dep
1: 23/03/2017 14:13:45 10.00 0.4486751
2: 23/03/2017 14:15:15 2.00 0.2855205
3: 23/03/2017 14:15:45 2.00 0.3161120
4: 23/03/2017 14:16:15 1.75 0.3492528
Upvotes: 2