Reputation: 309
I have a number of matrices that they all have the same type of elements but different lengths. Columns in all files are the same (lets call them "A" and "B") but rows between files are mostly the same elements but not always.
Here are some example data (in the form of dataframes)
df1 <- data.frame(A = 1:3, B = 3:1)
rownames(df1)=c("alpha","beta","gamma")
df2 <- data.frame(A = 1:5,B = 5:1)
rownames(df2)=c("alpha","beta","delta","gamma","zeta")
df3 <- data.frame(A = 1:7, B = 7:1)
rownames(df3)=c("alpha","beta","delta","gamma","zeta","theta","epsilon")
as you can see as far as the rows go even though "alpha","beta" and "gamma" are always present many of the others are not always there
I would like to calculate 2 things: the average values of all A and B columns in all matrices and ideally that would be by creating an ave.matr that would have all rownames and the average/mean values of the columns "A" and "B"
A B
alpha 1 7
beta 2 6
delta 3 5
gamma 4 4
zeta 5 3
theta 6 2
epsilon 7 1
(where the above numbers are the mean values of all matrices)
and then an occurrence matrix, lets call it occur.matr that would count the number of occurrences of each row across all matrices and it should look like that
A B
alpha 3
beta 3
delta 2
gamma 3
zeta 2
theta 1
epsilon 1
I have started working on this today but I cannot figure out how to do it.
I started by creating a list and a matrix with the unique rownames from all matrices
list=c(rownames(df1),rownames(df2),rownames(df3))
unique=unique(list)
avematr<-matrix(NA,nrow=length(unique),ncol=2)
and my next step would be to make rownames of all matrices identical. I tried with match but i cannot figure it out but at this moment I dont even know if this is the best strategy... And all similar questions out there are related to merging the matrices (which is not what I want to do).
Any help is greatly appreciated
Upvotes: 0
Views: 114
Reputation: 898
If you want to stick to base R:
For the averaging task it makes things easier when you add your rowname as a column. This prevents autonumbering of rownames when combining the dataframes. You then can simply loop over every unique rowname and construct the averages. A quick and dirty solution could look like this:
df1 <- data.frame(A = 1:3, B = 3:1)
rownames(df1)=c("alpha","beta","gamma")
df2 <- data.frame(A = 1:5,B = 5:1)
rownames(df2)=c("alpha","beta","delta","gamma","zeta")
df3 <- data.frame(A = 1:7, B = 7:1)
rownames(df3)=c("alpha","beta","delta","gamma","zeta","theta","epsilon")
add_row_names_to_df <- function(df) {
df$rn <- rownames(df)
return(df)
}
new_df <- rbind(add_row_names_to_df(df1),
add_row_names_to_df(df2),
add_row_names_to_df(df3))
avg_df <- as.data.frame(matrix(unique(new_df$rn),
nrow = length(unique(new_df$rn)),
ncol = 3))
for(i in 1:nrow(avg_df)) {
avg.df[i,] <- c(avg_df[i,1],
mean(new_df$A[new_df$rn==avg_df[i,1]]),
mean(new_df$B[new_df$rn==avg_df[i,1]]))
}
colnames(avg_df) <- c("rowname", "avgA", "avgB")
avg_df
results in:
rowname avgA avgB
1 alpha 1 5
2 beta 2 4
3 gamma 3.66666666666667 2.33333333333333
4 delta 3 4
5 zeta 5 2
6 theta 6 2
7 epsilon 7 1
For the occurence matrix you can use the table()
function from R:
as.matrix(table(c(rownames(df1),rownames(df2),rownames(df3))))
yields:
[,1]
alpha 3
beta 3
delta 2
epsilon 1
gamma 3
theta 1
zeta 2
Upvotes: 0
Reputation: 18551
Here is a tidyverse approach:
library(tidyverse)
df1 <- data.frame(A = 1:3, B = 3:1)
rownames(df1)=c("alpha","beta","gamma")
df2 <- data.frame(A = 1:5,B = 5:1)
rownames(df2)=c("alpha","beta","delta","gamma","zeta")
df3 <- data.frame(A = 1:7, B = 7:1)
rownames(df3)=c("alpha","beta","delta","gamma","zeta","theta","epsilon")
dat <- list(df1, df2, df3) %>%
map_dfr(rownames_to_column)
avg_dat <- dat %>%
group_by(id) %>%
summarise(A = mean(A),
B = mean(B))
#> `summarise()` ungrouping output (override with `.groups` argument)
avg_dat
#> # A tibble: 7 x 3
#> id A B
#> <chr> <dbl> <dbl>
#> 1 alpha 1 5
#> 2 beta 2 4
#> 3 delta 3 4
#> 4 epsilon 7 1
#> 5 gamma 3.67 2.33
#> 6 theta 6 2
#> 7 zeta 5 2
occ_dat <- dat %>% count(id)
occ_dat
#> id n
#> 1 alpha 3
#> 2 beta 3
#> 3 delta 2
#> 4 epsilon 1
#> 5 gamma 3
#> 6 theta 1
#> 7 zeta 2
Created on 2021-01-27 by the reprex package (v0.3.0)
Upvotes: 2