Reputation: 85
My code for importing a specific file so far looks like
df <- read_excel("File path",
col_types = c("numeric", "text", "numeric",
"numeric", "numeric", "numeric",
"text", "text", "text", "text", "text", "text", "text", "text",
"text", "text", "text", "text", "text", "text", "text", "text",
"text", "text", "text", "text", "text", "text", "text", "text",
"text", "text", "text", "text", "text", "text", "text", "text",
"text", "text", "text", "text", "text", "text", "text", "text",
"text", "text", "text", "text", "text", "text", "text", "text",
"text", "text", "text", "text", "text", "text", "text", "text",
"text", "text", "text", "text", "text", "text", "text", "text",
"text", "text", "text", "numeric", "numeric", "numeric", "numeric",
"numeric", "numeric", "text", "text", "text", "text", "text", "text",
"text", "text", "text", "text", "text", "text", "text"),
skip = 8)
How would I go about compressing specifically the "col types" sections but still keeping the same effect. I've tried sapply(df, as.numeric)
, but this changes all columns to numeric where I specifically need the second to be text.
Note: I understand other columns besides the second have "text", the example is a halfway point of my attempt.
Upvotes: 1
Views: 72
Reputation: 269694
Note that read_excel
will guess the types for you but if that does not work on your spreadsheet:
1) rep Use rep
like this:
col_types <- rep(c("numeric", "text", "numeric", "text", "numeric", "text"),
c(1L, 1L, 4L, 67L, 6L, 13L))
# test - col_types_orig defined in Note at end
identical(col_types, col_types_orig)
## [1] TRUE
2) rle We can also compress using rle
and then uncompress using inverse.rle
:
r <- rle(col_types_orig)
col_types <- inverse.rle(r)
identical(inverse.rle(r), col_types_orig)
## [1] TRUE
You can get r
as R code using dput(r)
. (In fact we got the arguments to rep
in (1) by examining this dput
output.)
3) Noting that there are 92 elements in col_types_orig
and all are text except for a few numeric we can do this:
length(col_types_orig)
## [1] 92
table(col_types_orig)
## col_types_orig
## numeric text
## 11 81
which(col_types_orig == "numeric")
## [1] 1 3 4 5 6 74 75 76 77 78 79
col_types <- replace(rep("text", 92), c(1, 3:6, 74:79), "numeric")
identical(col_types, col_types_orig)
## [1] TRUE
Note:
col_types_orig <-
c("numeric", "text", "numeric", "numeric", "numeric", "numeric",
"text", "text", "text", "text", "text", "text", "text", "text",
"text", "text", "text", "text", "text", "text", "text", "text",
"text", "text", "text", "text", "text", "text", "text", "text",
"text", "text", "text", "text", "text", "text", "text", "text",
"text", "text", "text", "text", "text", "text", "text", "text",
"text", "text", "text", "text", "text", "text", "text", "text",
"text", "text", "text", "text", "text", "text", "text", "text",
"text", "text", "text", "text", "text", "text", "text", "text",
"text", "text", "text", "numeric", "numeric", "numeric", "numeric",
"numeric", "numeric", "text", "text", "text", "text", "text",
"text", "text", "text", "text", "text", "text", "text", "text"
Upvotes: 3
Reputation: 4534
If you are happy to take two steps to do the import you could read all in first as text and then convert the relevant columns to numeric using dplyr::mutate_at
:
library(tidyverse)
library(readxl)
df <- read_excel("File path", col_types = "text", skip = 8) %>%
mutate_at(c(1, 3:6, 74:79), as.numeric)
Upvotes: 2