Reputation: 23
I would like to average columns in a data set based on a unique identifier. I do not know ahead of time how many columns I will have for each unique identifier or what order they will come in. The unique IDs are all known before hand and are lists of weeks. I have found solutions for regular patterns but not solutions for using the actual column headers to sort out the average. Thanks for any and all help.
I present the original data and desired result. In the example there are only 2 unique IDs
x = read.table(text = "
site wk1 wk2 wk1 wk1
1 2 4 6 8
2 10 20 30 40
3 5 NA 2 3
4 100 100 NA NA",
sep = "", header = TRUE)
x
desired.outcome = read.table(text = "
site wk1avg wk2avg
1 3.3 4
2 26.6 20
3 3.3 NA
4 NA 100",
sep = "", header = TRUE)
Upvotes: 1
Views: 306
Reputation: 43344
Here's a tidyr
and dplyr
approach:
library(dplyr)
library(tidyr)
x %>% gather(wk, val, -site) %>% # gather wk* columns into key-value pairs
extract(wk, 'wk', '(wk\\d+).*?') %>% # trim suffixes added by read.table
group_by(site, wk) %>%
summarise(mean_val = mean(val)) %>% # calculate grouped means
spread(wk, mean_val) # spread back into wk* columns
# Source: local data frame [4 x 3]
# Groups: site [4]
#
# site wk1 wk2
# (int) (dbl) (dbl)
# 1 1 5.333333 4
# 2 2 26.666667 20
# 3 3 3.333333 NA
# 4 4 NA 100
Upvotes: 1
Reputation: 93851
If your original data file has duplicated column names, read.table
will change them so all the columns have unique values (as you can see by checking x
in your example after it's loaded). In fact, the code below depends on that happening, because melt
will drop columns with duplicated names. Then we use mutate
to remove the extra text added by read.table
to de-duplicate the column names so that we can group properly by week.
library(reshape2)
library(dplyr)
x %>% melt(id.var="site") %>% # Convert to long format
mutate(variable = gsub("\\..*", "", variable)) %>% # "re-duplicate" original column names
group_by(site, variable) %>%
summarise(mn = mean(value)) %>%
dcast(site ~ variable)
site wk1 wk2
1 1 5.333333 4
2 2 26.666667 20
3 3 3.333333 NA
4 4 NA 100
Upvotes: 3