Reputation: 501
I am reading in a fairy large data file, with about 300 variables. I am using readr:read_csv(), which, for 95% does the job. However, for 8 variables I get a parsing error. These variables are thought to be logicals, but they are in fact character. Now, I don't want to manually format all column types, since 95% is going correct. I only want to set the column type for those columns who get a parsing error. How can I do that?
Upvotes: 2
Views: 2587
Reputation: 196
You can set the data types of specific columns using the argument col_types
, for example:
df <- read_csv(
"my_file.csv",
col_types = cols(
x = col_character(), # or string abbreviation "c"
y = col_logical()
)
)
with x
and y
being the column names. Not that all other columns will be parsed automatically, so you don't need to specify all columns. See the readr
documentation (section Available column specifications) for more details!
Upvotes: 2
Reputation: 3088
can you try using the library(data.table) and the fread function.
for example data=fread("fantastic_cluster.csv")
. With the colClasses = c("date" = "character")
you can specify also the structure of some columns. If you a post a subset of your data I might be more helpful.
Upvotes: 0
Reputation: 16978
Just set the type for those remaining type manually. For example, if all remaining columns that caused a parsing error should be logicals, you could use
library(dplyr)
df %>%
mutate(across(c("COLUMNS_TO_CHANGE"), as.logical))
There are several ways to select multiple columns inside the across
-funtion, for example starts_With
, ends_With
, matches
, depending on how the columns are named and if there is a pattern for those columns to select.
Take a look into the tidyverse
-package, especially dplyr
and tidyr
for tasks like that. The book R for Data Science is a good start.
Upvotes: 0