D. Grayson
D. Grayson

Reputation: 85

How to compress imported column types in R

My code for importing a specific file so far looks like

df <- read_excel("File path", 
       col_types = c("numeric", "text", "numeric", 
         "numeric", "numeric", "numeric", 
         "text", "text", "text", "text", "text", "text", "text", "text", 
         "text", "text", "text", "text", "text", "text", "text", "text", 
         "text", "text", "text", "text", "text", "text", "text", "text", 
         "text", "text", "text", "text", "text", "text", "text", "text", 
         "text", "text", "text", "text", "text", "text", "text", "text", 
         "text", "text", "text", "text", "text", "text", "text", "text", 
         "text", "text", "text", "text", "text", "text", "text", "text", 
         "text", "text", "text", "text", "text", "text", "text", "text", 
         "text", "text", "text", "numeric", "numeric", "numeric", "numeric", 
         "numeric", "numeric", "text", "text", "text", "text", "text", "text", 
         "text", "text", "text", "text", "text", "text", "text"), 
       skip = 8)

How would I go about compressing specifically the "col types" sections but still keeping the same effect. I've tried sapply(df, as.numeric), but this changes all columns to numeric where I specifically need the second to be text.

Note: I understand other columns besides the second have "text", the example is a halfway point of my attempt.

Upvotes: 1

Views: 72

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 269694

Note that read_excel will guess the types for you but if that does not work on your spreadsheet:

1) rep Use rep like this:

col_types <- rep(c("numeric", "text", "numeric", "text", "numeric", "text"),
                 c(1L, 1L, 4L, 67L, 6L, 13L))

# test - col_types_orig defined in Note at end
identical(col_types, col_types_orig) 
## [1] TRUE

2) rle We can also compress using rle and then uncompress using inverse.rle:

r <- rle(col_types_orig)
col_types <- inverse.rle(r)

identical(inverse.rle(r), col_types_orig)
## [1] TRUE

You can get r as R code using dput(r). (In fact we got the arguments to rep in (1) by examining this dput output.)

3) Noting that there are 92 elements in col_types_orig and all are text except for a few numeric we can do this:

length(col_types_orig)
## [1] 92

table(col_types_orig)
## col_types_orig
## numeric    text 
##      11      81 

which(col_types_orig == "numeric")
## [1]  1  3  4  5  6 74 75 76 77 78 79

col_types <- replace(rep("text", 92), c(1, 3:6, 74:79), "numeric")

identical(col_types, col_types_orig)
 ## [1] TRUE

Note:

col_types_orig <- 
c("numeric", "text", "numeric", "numeric", "numeric", "numeric", 
"text", "text", "text", "text", "text", "text", "text", "text", 
"text", "text", "text", "text", "text", "text", "text", "text", 
"text", "text", "text", "text", "text", "text", "text", "text", 
"text", "text", "text", "text", "text", "text", "text", "text", 
"text", "text", "text", "text", "text", "text", "text", "text", 
"text", "text", "text", "text", "text", "text", "text", "text", 
"text", "text", "text", "text", "text", "text", "text", "text", 
"text", "text", "text", "text", "text", "text", "text", "text", 
"text", "text", "text", "numeric", "numeric", "numeric", "numeric", 
"numeric", "numeric", "text", "text", "text", "text", "text", 
"text", "text", "text", "text", "text", "text", "text", "text"

Upvotes: 3

markdly
markdly

Reputation: 4534

If you are happy to take two steps to do the import you could read all in first as text and then convert the relevant columns to numeric using dplyr::mutate_at:

library(tidyverse)
library(readxl)

df <- read_excel("File path", col_types = "text", skip = 8) %>%
  mutate_at(c(1, 3:6, 74:79), as.numeric)

Upvotes: 2

Related Questions