user1
user1

Reputation: 444

r: read_csv, cols(): Specify multiple column types at once

Is it possible to specify multiple column types with one assignment in cols() from read_csv?

Instead of:

read_csv2(my_file,
          col_types = cols(.default = 'i',
                           logi_one = 'l',
                           logi_two = 'l',
                           date_one = 'D',
                           date_two = 'D'))

I want to do something like

read_csv2(my_file,
          col_types = cols(.default = 'i',
                           c(logi_one, logi_two) = 'l',
                           c(date_one, date_two) = 'D'))

Upvotes: 3

Views: 4288

Answers (3)

alejandro_hagan
alejandro_hagan

Reputation: 1003

This is my first answer to stack overflow question but I played around with the question because I had similar question a while back and I while the above solutions may be valid, I wanted to provide an alternative.

  1. Assign the columns names you want to a vector; eg
custom_col_logic<- c("logi_one","logi_two") 

custom_col_date<- c("date_one","date_two") 
  1. Then use the map() function on each to apply the col_logic() and col_date() in to separate arguments. Then assign the column names to each of the arguments.
#assign elments col_logic or col_date
type_logical <-map(custom_col_logic,~col_logic())

type_date <-map(custom_col_date,~col_date())

#now assign the column names to this
names(type_logtical)<-custom_col_logic

names(type_date) <-custom_col_date
  1. Here is the trick, you then need to use the as.col_spec() argument to turn these two vectors into col_spec class.
type_logical<- as.col_spec(type_logical)
type_date <- as.col_spec(type_date)
  1. Lastly assign a new variable to cols() and then add to that variable the above custom cols
#assign new varibale to class cols
custom_col_type <- cols()

#assign the variables from before to this new variable's cols argument

custom_col_type$cols <- c(type_logical,type_date)

Then you are done! now you can use that as a direct argument in the col_type argument in read_csv

Thanks!

If you found this helpful, please vote or mark it as the final answer

Upvotes: 0

zephryl
zephryl

Reputation: 17069

Here's a wrapper around readr::cols() that allows you to set types on multiple columns at once.

library(tidyverse)

my_cols <- function(..., .default = col_guess()) {
  dots <- enexprs(...)
  colargs <- flatten_chr(unname(
    imap(dots, ~ {
      colnames <- syms(.x)
      colnames <- colnames[colnames != sym("c")]
      coltypes <- rep_along(colnames, .y)
      purrr::set_names(coltypes, colnames)
    })
  ))
  cols(!!!colargs, .default = .default)
}

Example use:

set.seed(1)

# write sample .csv file
write_csv2(
  data.frame(
    int_one = sample(1:10, 10),
    logi_one = sample(c(TRUE, FALSE), 10, replace = TRUE),
    date_one = paste0("2022-01-", sample(10:31, 10)),
    int_two = sample(1:10, 10),
    logi_two = sample(c(TRUE, FALSE), 10, replace = TRUE),
    date_two = paste0("2022-02-", sample(10:28, 10))
  ),
  "my_file.csv"
)

read_csv2(
  "my_file.csv",
  col_types = my_cols(
    .default = 'i',
    l = c(logi_one, logi_two),
    D = c(date_one, date_two)
  )
)
#> # A tibble: 10 x 6
#>    int_one logi_one date_one   int_two logi_two date_two  
#>      <int> <lgl>    <date>       <int> <lgl>    <date>    
#>  1       9 TRUE     2022-01-18       1 FALSE    2022-02-15
#>  2       4 TRUE     2022-01-24       4 FALSE    2022-02-16
#>  3       7 TRUE     2022-01-14       3 FALSE    2022-02-19
#>  4       1 TRUE     2022-01-31       6 TRUE     2022-02-28
#>  5       2 TRUE     2022-01-23       2 TRUE     2022-02-17
#>  6       5 FALSE    2022-01-29       7 FALSE    2022-02-23
#>  7       3 FALSE    2022-01-26       5 TRUE     2022-02-11
#>  8      10 FALSE    2022-01-11       8 FALSE    2022-02-22
#>  9       6 FALSE    2022-01-19       9 FALSE    2022-02-25
#> 10       8 TRUE     2022-01-28      10 TRUE     2022-02-20

Created on 2022-03-05 by the reprex package (v2.0.1)

Upvotes: 3

AndrewGB
AndrewGB

Reputation: 16856

Here is one possibility (though a little complicated and verbose). If you have a list of the columns that you want to change, then we can create a single string for the col_types. From the help for ?read_csv, the col_types argument can take a single string of column shortcuts (e.g., iiDl). Here, I read in the column names, then bind that to the list of columns that need to be changed. Then, I replace any NA with the default type, i, then I collapse all column types into a single string. Then, I use that to define the col_types in read_csv.

library(tidyverse)

col_classes <-
  bind_rows(
    read_csv(my_file, col_types = cols(.default = "c"))[0, ],
    tibble(
      logi_one = 'i',
      logi_two = 'i',
      date_one = 'D',
      date_two = 'l'
    )
  ) %>%
  mutate(across(everything(), ~ replace_na(., "i"))) %>%
  as.character(.[1, ]) %>%
  paste0(., collapse = "")

results <- read_csv(my_file, col_types = col_classes)

However, this obviously would not work for read_csv2. But you could collapse every row back down, like this:

output <-
  data.frame(apply(read_csv(myfile), 1, function(x)
    paste(x, collapse = ",")))

names(output) <- paste(names(results), collapse = ",")

Upvotes: 0

Related Questions