Reputation: 165

Function for checking data type for several data frames R

I would like to create a function, where the argument (input) would be unknown amount of data frames (could vary) and output is the data frame with data type for each column of data frames from the input.

Example: I have 2 data frames below (amount of data frames can vary, so I am not sure how to pass it as a function argument).


# Dataframe 1
kpi_id <- c("SL",  "OOS")
kpi_val <- c (1,2)

df1 <-  data.frame(kpi_id,   kpi_val)

> sapply(df1, class)

   kpi_id     kpi_val 
"character"   "numeric"

# Dataframe 2
kpi_id <- c("SL",  "OOS")
kpi_val <- c ("3", "4")

df2 <-  data.frame(kpi_id,   kpi_val)

> sapply(df2, class)
  kpi_id     kpi_val 
"character" "character"

I can get a result in a simple manner as below:

df_types1 <- as.data.frame(sapply(df1, class)) 
colnames(df_types)[1] <- deparse(substitute(df1))


df_types2 <- as.data.frame(sapply(df2, class)) 
colnames(df_types)[1] <- deparse(substitute(df2))


df_types3 <- bind_cols(df_types1, df_types2)

> df_types3
              df1       df2
kpi_id  character   character
kpi_val   numeric   character

How can I create a function where initial amount of data frames is unknown to get the same output?

Upvotes: 2

Answers (4)

Sam Firke

Reputation: 23064

library(janitor)
compare_df_cols(df1, df2)

  column_name       df1       df2
1      kpi_id character character
2     kpi_val   numeric character

Upvotes: 1

AndrewGB

Reputation: 16876

Here is another option using tidyverse with the addition of using janitor and data.table to get it into the desired format:

library(tidyverse)

lst(df1, df2) %>%
  map_dfr(., ~ map_df(.x, class), .id = "var") %>%
  data.table::transpose(keep.names = "var") %>%
  janitor::row_to_names(1) %>%
  as_tibble() %>%
  column_to_rownames("var")

Output

              df1       df2
kpi_id  character character
kpi_val   numeric character

Upvotes: 1

jay.sf

Reputation: 73842

Using rapply.

rapply(list(df1=df1, df2=df2), class, how='l') |>
  do.call(what='cbind')
#                 df1         df2        
# kpi_id  "character" "character"
# kpi_val "numeric"   "character"

If you get weird output due to multiple classes,

df1$date <- df2$date <- as.POSIXct(Sys.Date())

rapply(list(df1=df1, df2=df2), class, how='l') |>
  do.call(what='cbind')
#                df1         df2        
# kpi_id  "character" "character"
# kpi_val "numeric"   "character"
# date    character,2 character,2

you could use data.class which returns just the first one:

rapply(list(df1=df1, df2=df2), data.class, how='l') |>
  do.call(what='cbind')
#                df1         df2        
# kpi_id  "character" "character"
# kpi_val "numeric"   "character"
# date    "POSIXct"   "POSIXct"

Upvotes: 1

langtang

Reputation: 24907

Here is a function you can use; pass a list of data frames, whether that list is named, or unnamed:

df_types <- function(dfs) {
  do.call(
    rbind, 
    lapply(seq_along(dfs), function(d) {
        data.frame(
          df = ifelse(is.null(names(dfs)), rep(d,ncol(dfs[[d]])), names(dfs)[d]),
          col = names(dfs[[d]]),
          type=sapply(dfs[[d]],typeof),row.names = NULL)
      })
  )
}

Usage

df_types(list("a" = df1,"b" = df2))

Output:

  df     col      type
1  a  kpi_id character
2  a kpi_val    double
3  b  kpi_id character
4  b kpi_val character

Upvotes: 1

Function for checking data type for several data frames R

Answers (4)

Related Questions