SHW
SHW

Reputation: 501

Set column type for some columns, but not for all readr::read_csv()

I am reading in a fairy large data file, with about 300 variables. I am using readr:read_csv(), which, for 95% does the job. However, for 8 variables I get a parsing error. These variables are thought to be logicals, but they are in fact character. Now, I don't want to manually format all column types, since 95% is going correct. I only want to set the column type for those columns who get a parsing error. How can I do that?

Upvotes: 2

Views: 2587

Answers (3)

malin-fischer
malin-fischer

Reputation: 196

You can set the data types of specific columns using the argument col_types, for example:

df <- read_csv(
  "my_file.csv", 
  col_types = cols(
    x = col_character(), # or string abbreviation "c"
    y = col_logical()
  )
)

with x and y being the column names. Not that all other columns will be parsed automatically, so you don't need to specify all columns. See the readr documentation (section Available column specifications) for more details!

Upvotes: 2

LDT
LDT

Reputation: 3088

can you try using the library(data.table) and the fread function. for example data=fread("fantastic_cluster.csv"). With the colClasses = c("date" = "character") you can specify also the structure of some columns. If you a post a subset of your data I might be more helpful.

Upvotes: 0

Martin Gal
Martin Gal

Reputation: 16978

Just set the type for those remaining type manually. For example, if all remaining columns that caused a parsing error should be logicals, you could use

library(dplyr)

df %>%
   mutate(across(c("COLUMNS_TO_CHANGE"), as.logical))

There are several ways to select multiple columns inside the across-funtion, for example starts_With, ends_With, matches, depending on how the columns are named and if there is a pattern for those columns to select.

Take a look into the tidyverse-package, especially dplyr and tidyr for tasks like that. The book R for Data Science is a good start.

Upvotes: 0

Related Questions