Seb_aj
Seb_aj

Reputation: 435

Comparing Column names in R across various data frames

I am currently try to compare the column classes and names of various data frames in R prior to undertaking any transformations and calculations. The code I have is noted below::

library(dplyr)
m1 <-  mtcars
m2 <-  mtcars %>% mutate(cyl = factor(cyl), xxxx1 = factor(cyl))
m3 <-  mtcars %>% mutate(cyl = factor(cyl), xxxx2 = factor(cyl))

out <-  cbind(sapply(m1, class), sapply(m2, class), sapply(m3, class))

If someone can solve this for dataframes stored in a list, that would be great. All my dataframes are currently stored in a list, for easier processing.

All.list <- list(m1,m2,m3)

I am expecting that the output is displayed in a matrix form as shown in the dataframe "out". The output in "out" is not desireable as it is incorrect. I am expecting the output to be more along the following::

enter image description here

Upvotes: 5

Views: 6115

Answers (2)

Sam Firke
Sam Firke

Reputation: 23014

Try compare_df_cols() from the janitor package:

library(janitor)
compare_df_cols(All.list)

#>    column_name All.list_1 All.list_2 All.list_3
#> 1           am    numeric    numeric    numeric
#> 2         carb    numeric    numeric    numeric
#> 3          cyl    numeric     factor     factor
#> 4         disp    numeric    numeric    numeric
#> 5         drat    numeric    numeric    numeric
#> 6         gear    numeric    numeric    numeric
#> 7           hp    numeric    numeric    numeric
#> 8          mpg    numeric    numeric    numeric
#> 9         qsec    numeric    numeric    numeric
#> 10          vs    numeric    numeric    numeric
#> 11          wt    numeric    numeric    numeric
#> 12       xxxx1       <NA>     factor       <NA>
#> 13       xxxx2       <NA>       <NA>     factor

It accepts both a list and/or the individual named data.frames, i.e., compare_df_cols(m1, m2, m3).

Disclaimer: I maintain the janitor package to which this function was recently added - posting it here as it addresses exactly this use case.

Upvotes: 7

Henry Cyranka
Henry Cyranka

Reputation: 3060

I think the easiest way would be to define a function, and then use a combination of lapply and dplyr to obtain the result you want. Here is how I did it.

library(dplyr)
m1 <-  mtcars
m2 <-  mtcars %>% mutate(cyl = factor(cyl), xxxx1 = factor(cyl))
m3 <-  mtcars %>% mutate(cyl = factor(cyl), xxxx2 = factor(cyl))

All.list <- list(m1,m2,m3)


##Define a function to get variable names and types
my_function <- function(data_frame){
  require(dplyr)
  x <- tibble(`var_name` = colnames(data_frame),
              `var_type` = sapply(data_frame, class))
  return(x)
}


target <- lapply(1:length(All.list),function(i)my_function(All.list[[i]]) %>% 
mutate(element =i)) %>%
  bind_rows() %>%
  spread(element, var_type)

target

Upvotes: 1

Related Questions