Reputation: 7167
I'm trying to read a file with fread
, and select columns based on both index and column names. I can do this with dplyr
but cannot with data.table
. Just learning the latter. Any idea how to do it? Can't find the solution online or help file. A dummy example:
library(readr)
library(data.table)
# Create dummy data
DT <- data.table(ID = 1:50,
Code = sample(LETTERS[1:4], 50, replace = T),
State = rep(c("Alabama","Indiana","Texas","Nevada"), 50))
# Export to csv
write_csv(DT,"test.csv")
rm(DT)
# Import
DT <- fread("test.csv", select = "ID") # col name
DT <- fread("test.csv", select = c(2)) # col index
DT <- fread("test.csv", select = c("ID") | c(2)) # both = ERROR
DT <- fread("test.csv")[c("ID") | c(2)] # Error too (NOT IDEAL since loading all data anyway)
# Dplyr's approach
DT <- read_csv("test.csv", col_select = c("ID") | c(2)) # Works!
Upvotes: 0
Views: 239
Reputation: 160687
A slight adaptation of https://stackoverflow.com/a/62207245/3358272:
cols <- colnames(fread("test.csv", nrows=0))
cols
# [1] "ID" "Code" "State"
fread("test.csv", select=which(cols %in% "ID" | seq_along(cols) %in% 2))
# ID Code
# <int> <char>
# 1: 1 D
# 2: 2 D
# 3: 3 B
# 4: 4 C
# 5: 5 D
# 6: 6 A
# 7: 7 C
# 8: 8 B
# 9: 9 B
# 10: 10 A
# ---
# 191: 41 D
# 192: 42 B
# 193: 43 D
# 194: 44 D
# 195: 45 D
# 196: 46 C
# 197: 47 D
# 198: 48 A
# 199: 49 A
# 200: 50 D
Upvotes: 2