Reputation: 63
My question might be not so clear so I am putting an example.
My final goal is to produce
final=(df1$a*df2$b)+(df1$a*df3$c*df4$d)+(df4$d*df5$e)
I have five data frames (one column each) with different lengths as follows:
df1
a
1. 1
2. 2
3. 4
4. 2
df2
b
1. 2
2. 6
df3
c
1. 2
2. 4
3. 3
df4
d
1. 1
2. 2
3. 4
4. 3
df5
e
1. 4
2. 6
3. 2
So I want a final database which includes them all as follows
finaldf
a b c d e
1. 1 2 2 1 4
2. 2 6 4 2 6
3. 4 NA 3 4 2
4. 2 NA NA 3 NA
I want all the NAs for each column to be replaced with the mean of that column, so the finaldf
has equal length of all the columns:
finaldf
a b c d e
1. 1 2 2 1 4
2. 2 6 4 2 6
3. 4 4 3 4 2
4. 2 4 3 3 4
and therefore I can produce a final result for final=(df1$a*df2$b)+(df1$a*df3$c*df4$d)+(df4$d*df5$e) as I need.
Upvotes: 0
Views: 39
Reputation: 9868
With purrr and dplyr, we can first put all dataframes in a list with mget(). Second, use set_names
to replace the dataframe names with their respective column names. As a third step, unlist the dataframes to get vectors with pluck
. Then add the NAs by making all vectors the same length
.
Finally, bind all vectors back into a dataframe with as.data.frame
, then use mutate
with ~replace_na and colmeans.
library(dplyr)
library(purrr)
mget(ls(pattern = 'df\\d')) %>%
set_names(map_chr(., colnames)) %>%
map(pluck, 1) %>%
map(., `length<-`, max(lengths(.))) %>%
as.data.frame %>%
mutate(across(everything(), ~replace_na(.x, mean(.x, na.rm=TRUE))))
Upvotes: 1
Reputation: 11686
The easiest by far is to use the qpcR, dplyr and tidyr packages.
library(dplyr)
library(qpcR)
library(tidyr)
df1 <- data.frame(a=c(1,2,4,2))
df2 <- data.frame(b=c(2,6))
df3 <- data.frame(c=c(2,4,3))
df4 <- data.frame(d=c(1,2,4,3))
df5 <- data.frame(e=c(4,6,2))
mydf <- qpcR:::cbind.na(df1, df2, df3, df4,df5) %>%
tidyr::replace_na(.,as.list(colMeans(.,na.rm=T)))
> mydf
a b c d e
1 1 2 2 1 4
2 2 6 4 2 6
3 4 4 3 4 2
4 2 4 3 3 4
Depending on your rgl settings, you might need to run the following at the top of your script to make the qpcR
package load (see https://stackoverflow.com/a/66127391/2554330 ):
options(rgl.useNULL = TRUE)
library(rgl)
Upvotes: 1