jjulip
jjulip

Reputation: 1133

How to summarise unique values in columns of a dataframe by multiple nested blocks in R

I have a dataframe with columns for datetime, id, time and depth. I am using ddply to get the mean time and depth for each unique datetime, as there are duplicate datetime rows. However, within each datetime block, there may be more than one unique 'id’ value, and again duplicate rows for individual ids. Therefore, what I need to do for each datetime block is actually calculate the mean of all the unique times taken from within the id blocks. i.e. I first need to get the unique time values from within each id block, then I want to calculate the mean for all of the unique time values returned using this method for each datetime block. I am trying to use %>% to do this, but this is new syntax for me and I am struggling. Any help, or alternative suggestions within the ddply wrapper for datetime would be appreciated. I provide an example below.

> dput(df3)
structure(list(datetime = c("23/03/2017 14:13:45", "23/03/2017 14:13:45", 
"23/03/2017 14:13:45", "23/03/2017 14:13:45", "23/03/2017 14:13:45", 
"23/03/2017 14:13:45", "23/03/2017 14:13:45", "23/03/2017 14:13:45", 
"23/03/2017 14:13:45", "23/03/2017 14:13:45", "23/03/2017 14:15:15", 
"23/03/2017 14:15:15", "23/03/2017 14:15:15", "23/03/2017 14:15:15", 
"23/03/2017 14:15:45", "23/03/2017 14:15:45", "23/03/2017 14:16:15", 
"23/03/2017 14:16:15", "23/03/2017 14:16:15", "23/03/2017 14:16:15", 
"23/03/2017 14:16:15", "23/03/2017 14:16:15", "23/03/2017 14:16:15"
), id = c(11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 12L, 
12L, 12L, 13L, 14L, 14L, 15L, 16L, 16L, 16L, 17L, 18L, 18L), 
    time = c(10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 
    3L, 3L, 3L, 1L, 2L, 2L, 1L, 3L, 3L, 3L, 1L, 2L, 2L), dep = c(0.448675132, 
    0.448675132, 0.448675132, 0.448675132, 0.448675132, 0.448675132, 
    0.448675132, 0.448675132, 0.448675132, 0.448675132, 0.285520539, 
    0.285520539, 0.285520539, 0.285520539, 0.316112025, 0.316112025, 
    0.326309187, 0.356900674, 0.356900674, 0.356900674, 0.38749216, 
    0.326309187, 0.326309187)), class = "data.frame", row.names = c(NA, 
-23L))

My attempt which doesn’t work:

#convert datetime to POSIXct
df3$datetime = as.POSIXct(strptime(df3$datetime, format="%d/%m/%Y %H:%M:%S"), tz="UTC")

#Now condense the dateframe by unique datetime summarising tim and dep cols
  dfCondensed = ddply(df3, .(datetime), summarise,
                      #get the mean time for each unique datetime, but calculate this using 
                      #all the unique time values found within each unique id 
                      meantime = group_by(id) %>% unique(time) %>% mean(),
                      #do the same as above but for dep
                      meandep = group_by(id) %>% unique(dep) %>% mean())

Desired output

> dfCondensed
             datetime   tim       dep
1 23/03/2017 14:13:45 10.00 0.4486751
2 23/03/2017 14:15:15  2.00 0.2855205
3 23/03/2017 14:15:45  2.00 0.3161120
4 23/03/2017 14:16:15  1.75 0.3492528

Upvotes: 1

Views: 312

Answers (3)

akrun
akrun

Reputation: 887951

We can use base R

df4 <- unique(df3)
by(df4[c('time', 'dep')], df4[c('datetime')], FUN = colMeans)

Or with aggregate from base R

aggregate(cbind(time, dep) ~ datetime, df4, mean)
#     datetime  time       dep
#1 23/03/2017 14:13:45 10.00 0.4486751
#2 23/03/2017 14:15:15  2.00 0.2855205
#3 23/03/2017 14:15:45  2.00 0.3161120
#4 23/03/2017 14:16:15  1.75 0.3492528

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 389325

I think you are looking for :

library(dplyr)

df3 %>%
   distinct() %>%
   group_by(datetime) %>%
   summarise(dep = mean(dep), mean = mean(time))

#  datetime              dep  mean
#  <chr>               <dbl> <dbl>
#1 23/03/2017 14:13:45 0.449 10   
#2 23/03/2017 14:15:15 0.286  2   
#3 23/03/2017 14:15:45 0.316  2   
#4 23/03/2017 14:16:15 0.349  1.75

Upvotes: 2

Wimpel
Wimpel

Reputation: 27792

here is a data.table approach

library(data.table)
setDT(df3)
unique(df3, by = c("datetime", "id"))[, .(mean.time = mean(time),
                                          mean.dep = mean(dep)), 
                                      by = .(datetime)][]

              datetime mean.time  mean.dep
1: 23/03/2017 14:13:45     10.00 0.4486751
2: 23/03/2017 14:15:15      2.00 0.2855205
3: 23/03/2017 14:15:45      2.00 0.3161120
4: 23/03/2017 14:16:15      1.75 0.3492528

Upvotes: 2

Related Questions