user330
user330

Reputation: 1270

calculating average multiple IDs in R

I have the following data:

df<-read.table (text=" id1  df1 id2 df2 id3 df3
22  00:20   44  00:30   78  00:20
33  00:20   45  01:30   78  00:30
22  01:30   44  00:40   78  01:20
32  00:30   45  01:20   90  02:00
22  01:45   44  02:20   90  01:30
32  01:30   44  00:30   50  01:41
", header=TRUE)

As you can see I have different ids, df is time, for example, 00:20 = 20 minutes.

I want to calculate the average for duplicated IDs for each df. I want to keep the order as it is to get the following table. I want to see NA just blank. I gave you a sample of my data but I have multiple columns.

id1 df1 id2 df2 id3 df3
22  71.66   44  60  78  43.33
32  60  45  85  90  105
33  20  NA  NA  50  101

Upvotes: 1

Views: 130

Answers (1)

akrun
akrun

Reputation: 887231

We could use map to loop over the sequence of integers in the column names, do a group by mean of the period converted 'df' columns and converted the period to seconds and minutes (period_to_seconds and divide by 60)

library(dplyr)
library(tidyr)
library(purrr)
library(lubridate)
library(versions)
# // if rowr is not installed, install it first
available.versions('rowr') # // check for available version.  Not in CRAN
install.versions('rowr', '1.1.2') # // install a version
library(rowr)
map(as.character(1:3), ~ 
    df %>% 
         dplyr::select(ends_with(.x)) %>%             
         group_by(across(all_of(1))) %>% 
         summarise(across(everything(),  ~ 
                  mean(period_to_seconds(hm(.))/60))))%>% 
 invoke(cbind.fill, ., fill = NA)

-output

#  id1      df1 id2 df2 id3       df3
#1  22 71.66667  44  60  50 101.00000
#2  32 60.00000  45  85  78  43.33333
#3  33 20.00000  NA  NA  90 105.00000

Or another option is to split the dataset with split.default

split.default(df, as.integer(gl(ncol(df), 2, ncol(df)))) %>%
   map(~ .x %>%
           group_by(across(all_of(1))) %>% 
         summarise(across(everything(),  ~ 
                  mean(period_to_seconds(hm(.))/60)), .groups = 'drop'))%>% 
 invoke(cbind.fill, ., fill = NA)

-output

#  id1      df1 id2 df2 id3       df3
#1  22 71.66667  44  60  50 101.00000
#2  32 60.00000  45  85  78  43.33333
#3  33 20.00000  NA  NA  90 105.00000

Upvotes: 1

Related Questions