juandmaz
juandmaz

Reputation: 53

How to specify to R that I want to import certain columns with a particular class?

I have a problem with R and I don't know how to solve it. I am working with a database of 47228 observations and 68 columns of an excel sheet. The case is that I have 7 columns in which in the excel 47719 observations of 47728 (99.99%) appear as NA and the other cases as dates, and surprisingly when I import the base with the read_excel function of the readxl package, R import it as 100% as NA. In other words, in those 9 observations that are dates, R doesn't show the reliable content.

The problem is that I need to be able to analyze the database including that data and I cannot, because it does not appear.

As it is a problem that arises with the import, I am trying to explain to R that those 7 columns process them as dates, but I can't do it. The read_excel function has an argument which is col_types but I have to fill it with a vector that specifies the type of the 68 columns, and what I am looking for is to simply specify those 8 columns. Does anyone know how to do it? I saw that there is a function called read_table from the readr package that allows on import to specify the class of only the variables that interest me, with the argument cols_only =, but for some reason when I execute that function R aborts the session with a warning "R encountered a fatal error."

Does anyone know what this could be? Greetings and thank you very much. Sorry for my english.

#Code with which R imports those 7 columns as 100% NA

library(readxl)

base_neg<- read_excel("Data/Bases/Canonica/Listado 2012_2014 NEGATIVAS.xls")


#Code with which I can supposedly achieve what I want but R collapses, specifying only 1 of the 7 variables needed.

library(readr)

base_neg<- read_table("Data/Bases/Canonica/Listado 2012_2014 NEGATIVAS.xls", 
                      col_types = cols_only(fecha_colpo_nav = "D"))```

Upvotes: 0

Views: 754

Answers (1)

danlooo
danlooo

Reputation: 10637

Start by importing everything as characters. Then, you can use readr::type_convert to parse some columns afterwards. The argument .default will be applied to all other columns.

library(readxl)
library(readr)

data <- read_excel("file.xlsx", col_types = "text")
data
# A tibble: 2 x 4                                                                                                                                                                      
#  col1  col2  col3  col4 
#  <chr> <chr> <chr> <chr>
# 1 Foo   44542 1     TRUE 
# 2 bar   44541 2     FALSE

# set the type to all other columns to character
type_convert(data, col_types = cols(col4 = col_logical(), .default = col_character()))

# A tibble: 2 x 4                                                                                                                                                                      
#  col1  col2  col3  col4 
#  <chr> <chr> <chr> <lgl>
# 1 Foo   44542 1     TRUE 
# 2 bar   44541 2     FALSE

# guess the type of all other columns
# but enforce col4 to be logical
type_convert(data, col_types = cols(col4 = col_logical()))
# A tibble: 2 x 4                                                                                                                                                                      
#  col1   col2  col3 col4 
#  <chr> <dbl> <dbl> <lgl>
# 1 Foo   44542     1 TRUE 
# 2 bar   44541     2 FALSE

Upvotes: 1

Related Questions