How to compress imported column types in R

Question

My code for importing a specific file so far looks like

df <- read_excel("File path", 
       col_types = c("numeric", "text", "numeric", 
         "numeric", "numeric", "numeric", 
         "text", "text", "text", "text", "text", "text", "text", "text", 
         "text", "text", "text", "text", "text", "text", "text", "text", 
         "text", "text", "text", "text", "text", "text", "text", "text", 
         "text", "text", "text", "text", "text", "text", "text", "text", 
         "text", "text", "text", "text", "text", "text", "text", "text", 
         "text", "text", "text", "text", "text", "text", "text", "text", 
         "text", "text", "text", "text", "text", "text", "text", "text", 
         "text", "text", "text", "text", "text", "text", "text", "text", 
         "text", "text", "text", "numeric", "numeric", "numeric", "numeric", 
         "numeric", "numeric", "text", "text", "text", "text", "text", "text", 
         "text", "text", "text", "text", "text", "text", "text"), 
       skip = 8)

How would I go about compressing specifically the "col types" sections but still keeping the same effect. I've tried sapply(df, as.numeric), but this changes all columns to numeric where I specifically need the second to be text.

Note: I understand other columns besides the second have "text", the example is a halfway point of my attempt.

G. Grothendieck · Accepted Answer

Note that read_excel will guess the types for you but if that does not work on your spreadsheet:

1) rep Use rep like this:

col_types <- rep(c("numeric", "text", "numeric", "text", "numeric", "text"),
                 c(1L, 1L, 4L, 67L, 6L, 13L))

# test - col_types_orig defined in Note at end
identical(col_types, col_types_orig) 
## [1] TRUE

2) rle We can also compress using rle and then uncompress using inverse.rle:

r <- rle(col_types_orig)
col_types <- inverse.rle(r)

identical(inverse.rle(r), col_types_orig)
## [1] TRUE

You can get r as R code using dput(r). (In fact we got the arguments to rep in (1) by examining this dput output.)

3) Noting that there are 92 elements in col_types_orig and all are text except for a few numeric we can do this:

length(col_types_orig)
## [1] 92

table(col_types_orig)
## col_types_orig
## numeric    text 
##      11      81 

which(col_types_orig == "numeric")
## [1]  1  3  4  5  6 74 75 76 77 78 79

col_types <- replace(rep("text", 92), c(1, 3:6, 74:79), "numeric")

identical(col_types, col_types_orig)
 ## [1] TRUE

Note:

col_types_orig <- 
c("numeric", "text", "numeric", "numeric", "numeric", "numeric", 
"text", "text", "text", "text", "text", "text", "text", "text", 
"text", "text", "text", "text", "text", "text", "text", "text", 
"text", "text", "text", "text", "text", "text", "text", "text", 
"text", "text", "text", "text", "text", "text", "text", "text", 
"text", "text", "text", "text", "text", "text", "text", "text", 
"text", "text", "text", "text", "text", "text", "text", "text", 
"text", "text", "text", "text", "text", "text", "text", "text", 
"text", "text", "text", "text", "text", "text", "text", "text", 
"text", "text", "text", "numeric", "numeric", "numeric", "numeric", 
"numeric", "numeric", "text", "text", "text", "text", "text", 
"text", "text", "text", "text", "text", "text", "text", "text"

How to compress imported column types in R

Answers (2)

Related Questions