xiaodai
xiaodai

Reputation: 16074

How to use col_types using readr's read_delim_chunked?

I am trying to read a file in chunks and specify the col_types, see MWE

write.csv(cars, "cars.csv")


library(readr)
readr::read_delim_chunked("cars.csv", function(x, i) {
  x
}, delim= ",", col_types = cols(
  speed = col_character()
), chunk_size = 10)

but I get erroneous output

NULL

but the non-chunked version works fine

library(readr)
readr::read_delim("cars.csv", delim= ",", col_types = cols(
  speed = col_character()
))

Upvotes: 1

Views: 744

Answers (2)

xiaodai
xiaodai

Reputation: 16074

For some reason, you need to wrap the function in DataFrameCallback$new for reasons that I don't understand.

write.csv(cars, "cars.csv")

Works

readr::read_delim_chunked("cars.csv",  DataFrameCallback$new(function(x, i) {
  x
}), col_types = cols(
  speed = col_character()
), delim= ",",  chunk_size = 10)

Gives error

readr::read_delim_chunked("cars.csv",  function(x, i) {
  x
}, col_types = cols(
  speed = col_character()
), delim= ",",  chunk_size = 10)

Upvotes: 0

akrun
akrun

Reputation: 887981

The issue would be that when we do the write.csv, the row.names were included as a new column

write.csv(cars, "cars.csv", row.names = FALSE, quote = FALSE)

Also, we need col_character() instead of col_character

readr::read_delim_chunked("cars.csv",  DataFrameCallback$new(function(x, i) {
  x
}), col_types = cols(
  speed = col_character()
), delim= ",",  chunk_size = 10)

Upvotes: 1

Related Questions