Reputation: 435
I am currently try to compare the column classes and names of various data frames in R prior to undertaking any transformations and calculations. The code I have is noted below::
library(dplyr)
m1 <- mtcars
m2 <- mtcars %>% mutate(cyl = factor(cyl), xxxx1 = factor(cyl))
m3 <- mtcars %>% mutate(cyl = factor(cyl), xxxx2 = factor(cyl))
out <- cbind(sapply(m1, class), sapply(m2, class), sapply(m3, class))
If someone can solve this for dataframes stored in a list, that would be great. All my dataframes are currently stored in a list, for easier processing.
All.list <- list(m1,m2,m3)
I am expecting that the output is displayed in a matrix form as shown in the dataframe "out". The output in "out" is not desireable as it is incorrect. I am expecting the output to be more along the following::
Upvotes: 5
Views: 6115
Reputation: 23014
Try compare_df_cols()
from the janitor package:
library(janitor)
compare_df_cols(All.list)
#> column_name All.list_1 All.list_2 All.list_3
#> 1 am numeric numeric numeric
#> 2 carb numeric numeric numeric
#> 3 cyl numeric factor factor
#> 4 disp numeric numeric numeric
#> 5 drat numeric numeric numeric
#> 6 gear numeric numeric numeric
#> 7 hp numeric numeric numeric
#> 8 mpg numeric numeric numeric
#> 9 qsec numeric numeric numeric
#> 10 vs numeric numeric numeric
#> 11 wt numeric numeric numeric
#> 12 xxxx1 <NA> factor <NA>
#> 13 xxxx2 <NA> <NA> factor
It accepts both a list and/or the individual named data.frames, i.e., compare_df_cols(m1, m2, m3)
.
Disclaimer: I maintain the janitor package to which this function was recently added - posting it here as it addresses exactly this use case.
Upvotes: 7
Reputation: 3060
I think the easiest way would be to define a function, and then use a combination of lapply and dplyr to obtain the result you want. Here is how I did it.
library(dplyr)
m1 <- mtcars
m2 <- mtcars %>% mutate(cyl = factor(cyl), xxxx1 = factor(cyl))
m3 <- mtcars %>% mutate(cyl = factor(cyl), xxxx2 = factor(cyl))
All.list <- list(m1,m2,m3)
##Define a function to get variable names and types
my_function <- function(data_frame){
require(dplyr)
x <- tibble(`var_name` = colnames(data_frame),
`var_type` = sapply(data_frame, class))
return(x)
}
target <- lapply(1:length(All.list),function(i)my_function(All.list[[i]]) %>%
mutate(element =i)) %>%
bind_rows() %>%
spread(element, var_type)
target
Upvotes: 1