user5946647
user5946647

Reputation:

Loading files with fread

I'm trying to load only the 3rd column from all csv files in a folder. Each file has 20,000 rows and 4 columns, and I'd like to end up with a data frame of n (n = number or files; 6 in this case) columns and 20,000 rows. I tried using colClasses to specify the columns I want to load, but without success. Also, I can't get it to create n columns rather than 4 columns (where each column represents one variable found in the all of the files). I'm trying to get a 6 * 20,000 data frame where each column represents a specified variable from each file (the 3rd column from each file). Any suggestions?

Upvotes: 1

Views: 1245

Answers (2)

Jaap
Jaap

Reputation: 83245

Like I said in my comment, you can use the select or drop arguments of fread.

An alternative solution is to read the files sapply and use idcol parameter of rbindlist to create an id-column. Next you reshape the dataset to wide format as follows (you will need data.table 1.9.7 for this):

library(data.table)
DT <- rbindlist(sapply(files, fread, header=TRUE, nrows=r, select=3), idcol = "id")
dcast(DT, rowid(id) ~ id, value.var="name-of-selected-column")

The result is a datatable with the used filenames as columnnames.

Upvotes: 2

cderv
cderv

Reputation: 6542

Try using the select argument of fread for keeping columns, or drop to not read columns, not colClasses. (with data.table 1.9.6)

something like this :

ltab <- lapply(files, fread, header=TRUE, select = 3, nrows=r))

You should obtain a list of 6 tables of 20000 rows and 1 column

Then a Tab <- do.call("cbind", ltab) should work.

Upvotes: 1

Related Questions