luchonacho
luchonacho

Reputation: 7167

Select columns with fread based on both index and column names

I'm trying to read a file with fread, and select columns based on both index and column names. I can do this with dplyr but cannot with data.table. Just learning the latter. Any idea how to do it? Can't find the solution online or help file. A dummy example:

library(readr)
library(data.table)
# Create dummy data
DT <- data.table(ID = 1:50,
                Code = sample(LETTERS[1:4], 50, replace = T),
                State = rep(c("Alabama","Indiana","Texas","Nevada"), 50))
# Export to csv
write_csv(DT,"test.csv")
rm(DT)
# Import
DT <- fread("test.csv", select = "ID") # col name
DT <- fread("test.csv", select = c(2)) # col index
DT <- fread("test.csv", select = c("ID") | c(2)) # both = ERROR
DT <- fread("test.csv")[c("ID") | c(2)] # Error too (NOT IDEAL since loading all data anyway)
# Dplyr's approach
DT <- read_csv("test.csv", col_select = c("ID") | c(2)) # Works!

Upvotes: 0

Views: 239

Answers (1)

r2evans
r2evans

Reputation: 160687

A slight adaptation of https://stackoverflow.com/a/62207245/3358272:

cols <- colnames(fread("test.csv", nrows=0))
cols
# [1] "ID"    "Code"  "State"
fread("test.csv", select=which(cols %in% "ID" | seq_along(cols) %in% 2))
#         ID   Code
#      <int> <char>
#   1:     1      D
#   2:     2      D
#   3:     3      B
#   4:     4      C
#   5:     5      D
#   6:     6      A
#   7:     7      C
#   8:     8      B
#   9:     9      B
#  10:    10      A
#  ---             
# 191:    41      D
# 192:    42      B
# 193:    43      D
# 194:    44      D
# 195:    45      D
# 196:    46      C
# 197:    47      D
# 198:    48      A
# 199:    49      A
# 200:    50      D

Upvotes: 2

Related Questions