Reputation: 165
I would like to create a function, where the argument (input) would be unknown amount of data frames (could vary) and output is the data frame with data type for each column of data frames from the input.
Example: I have 2 data frames below (amount of data frames can vary, so I am not sure how to pass it as a function argument).
# Dataframe 1
kpi_id <- c("SL", "OOS")
kpi_val <- c (1,2)
df1 <- data.frame(kpi_id, kpi_val)
> sapply(df1, class)
kpi_id kpi_val
"character" "numeric"
# Dataframe 2
kpi_id <- c("SL", "OOS")
kpi_val <- c ("3", "4")
df2 <- data.frame(kpi_id, kpi_val)
> sapply(df2, class)
kpi_id kpi_val
"character" "character"
I can get a result in a simple manner as below:
df_types1 <- as.data.frame(sapply(df1, class))
colnames(df_types)[1] <- deparse(substitute(df1))
df_types2 <- as.data.frame(sapply(df2, class))
colnames(df_types)[1] <- deparse(substitute(df2))
df_types3 <- bind_cols(df_types1, df_types2)
> df_types3
df1 df2
kpi_id character character
kpi_val numeric character
How can I create a function where initial amount of data frames is unknown to get the same output?
Upvotes: 2
Views: 110
Reputation: 23064
library(janitor)
compare_df_cols(df1, df2)
column_name df1 df2
1 kpi_id character character
2 kpi_val numeric character
Upvotes: 1
Reputation: 16876
Here is another option using tidyverse
with the addition of using janitor
and data.table
to get it into the desired format:
library(tidyverse)
lst(df1, df2) %>%
map_dfr(., ~ map_df(.x, class), .id = "var") %>%
data.table::transpose(keep.names = "var") %>%
janitor::row_to_names(1) %>%
as_tibble() %>%
column_to_rownames("var")
Output
df1 df2
kpi_id character character
kpi_val numeric character
Upvotes: 1
Reputation: 73842
Using rapply
.
rapply(list(df1=df1, df2=df2), class, how='l') |>
do.call(what='cbind')
# df1 df2
# kpi_id "character" "character"
# kpi_val "numeric" "character"
If you get weird output due to multiple classes,
df1$date <- df2$date <- as.POSIXct(Sys.Date())
rapply(list(df1=df1, df2=df2), class, how='l') |>
do.call(what='cbind')
# df1 df2
# kpi_id "character" "character"
# kpi_val "numeric" "character"
# date character,2 character,2
you could use data.class
which returns just the first one:
rapply(list(df1=df1, df2=df2), data.class, how='l') |>
do.call(what='cbind')
# df1 df2
# kpi_id "character" "character"
# kpi_val "numeric" "character"
# date "POSIXct" "POSIXct"
Upvotes: 1
Reputation: 24907
Here is a function you can use; pass a list of data frames, whether that list is named, or unnamed:
df_types <- function(dfs) {
do.call(
rbind,
lapply(seq_along(dfs), function(d) {
data.frame(
df = ifelse(is.null(names(dfs)), rep(d,ncol(dfs[[d]])), names(dfs)[d]),
col = names(dfs[[d]]),
type=sapply(dfs[[d]],typeof),row.names = NULL)
})
)
}
Usage
df_types(list("a" = df1,"b" = df2))
Output:
df col type
1 a kpi_id character
2 a kpi_val double
3 b kpi_id character
4 b kpi_val character
Upvotes: 1