Reputation:
I'm trying to load only the 3rd column from all csv files in a folder. Each file has 20,000 rows and 4 columns, and I'd like to end up with a data frame of n (n = number or files; 6 in this case) columns and 20,000 rows. I tried using colClasses
to specify the columns I want to load, but without success. Also, I can't get it to create n columns rather than 4 columns (where each column represents one variable found in the all of the files). I'm trying to get a 6 * 20,000 data frame where each column represents a specified variable from each file (the 3rd column from each file). Any suggestions?
Upvotes: 1
Views: 1245
Reputation: 83245
Like I said in my comment, you can use the select
or drop
arguments of fread
.
An alternative solution is to read the files sapply
and use idcol
parameter of rbindlist
to create an id-column. Next you reshape the dataset to wide format as follows (you will need data.table 1.9.7 for this):
library(data.table)
DT <- rbindlist(sapply(files, fread, header=TRUE, nrows=r, select=3), idcol = "id")
dcast(DT, rowid(id) ~ id, value.var="name-of-selected-column")
The result is a datatable with the used filenames as columnnames.
Upvotes: 2
Reputation: 6542
Try using the select
argument of fread
for keeping columns, or drop
to not read columns, not colClasses
. (with data.table 1.9.6
)
something like this :
ltab <- lapply(files, fread, header=TRUE, select = 3, nrows=r))
You should obtain a list of 6 tables of 20000 rows and 1 column
Then a Tab <- do.call("cbind", ltab)
should work.
Upvotes: 1