Reputation: 1270
I have the following data:
df<-read.table (text=" id1 df1 id2 df2 id3 df3
22 00:20 44 00:30 78 00:20
33 00:20 45 01:30 78 00:30
22 01:30 44 00:40 78 01:20
32 00:30 45 01:20 90 02:00
22 01:45 44 02:20 90 01:30
32 01:30 44 00:30 50 01:41
", header=TRUE)
As you can see I have different ids, df is time, for example, 00:20 = 20 minutes.
I want to calculate the average for duplicated IDs for each df. I want to keep the order as it is to get the following table. I want to see NA just blank. I gave you a sample of my data but I have multiple columns.
id1 df1 id2 df2 id3 df3
22 71.66 44 60 78 43.33
32 60 45 85 90 105
33 20 NA NA 50 101
Upvotes: 1
Views: 130
Reputation: 887231
We could use map
to loop over the sequence of integers in the column names, do a group by mean
of the period
converted 'df' columns and converted the period to seconds and minutes (period_to_seconds
and divide by 60)
library(dplyr)
library(tidyr)
library(purrr)
library(lubridate)
library(versions)
# // if rowr is not installed, install it first
available.versions('rowr') # // check for available version. Not in CRAN
install.versions('rowr', '1.1.2') # // install a version
library(rowr)
map(as.character(1:3), ~
df %>%
dplyr::select(ends_with(.x)) %>%
group_by(across(all_of(1))) %>%
summarise(across(everything(), ~
mean(period_to_seconds(hm(.))/60))))%>%
invoke(cbind.fill, ., fill = NA)
-output
# id1 df1 id2 df2 id3 df3
#1 22 71.66667 44 60 50 101.00000
#2 32 60.00000 45 85 78 43.33333
#3 33 20.00000 NA NA 90 105.00000
Or another option is to split the dataset with split.default
split.default(df, as.integer(gl(ncol(df), 2, ncol(df)))) %>%
map(~ .x %>%
group_by(across(all_of(1))) %>%
summarise(across(everything(), ~
mean(period_to_seconds(hm(.))/60)), .groups = 'drop'))%>%
invoke(cbind.fill, ., fill = NA)
-output
# id1 df1 id2 df2 id3 df3
#1 22 71.66667 44 60 50 101.00000
#2 32 60.00000 45 85 78 43.33333
#3 33 20.00000 NA NA 90 105.00000
Upvotes: 1