maximusdooku
maximusdooku

Reputation: 5542

Why is fread not accepting the skip command?

I have a .txt dataset where the first 12 lines are text followed by 2 blank rows and then the data

DATE           HEIGHT    INPUT     OUTPUT  TESTMEASURE
01/01/1933  NO RECORD   NO RECORD   MISSING     MISSING
01/02/1933  NO RECORD   NO RECORD   MISSING     MISSING

But when I do a

dat <- fread('data.txt'),

It skips 15 rows, and uses the first data line as column name for the imported dataset. It ignores the header line.

01/01/1933  NO RECORD   NO RECORD   MISSING     MISSING

The skip parameter is not affecting what I import at all. How can I mention the row number which needs to be used as the column name. Alternatively I can rename the column names, but the first line of data shouldn't be ignored.

DIAGNOSIS

Input contains no \n. Taking this to be a filename to open
File opened, filesize is 0.001319 GB.
Memory mapping ... ok
Detected eol as \r\n (CRLF) in that order, the Windows standard.
Positioned on line 1 after skip or autostart
This line is the autostart and not blank so searching up for the last non-blank ... line 1
Detecting sep ... '\t'
Detected 5 columns. Longest stretch was from line 15 to line 30
Starting data input on line 15 (either column names or first row of data). First 10 characters: 01/01/1933
The line before starting line 15 is non-empty and will be ignored (it has too few or too many items to be column names or data): DATE           HEIGHT    INPUT    OUTPUT  TESTMEASURE the fields on line 15 are character fields. Treating as the column names.

Upvotes: 0

Views: 691

Answers (1)

CPak
CPak

Reputation: 13581

You have 12 lines of text, 2 lines of spaces, and then your data. But I noticed extra whitespace between DATE and HEIGHT. So make a text file like this, where your data is tab-delimited, and add 2 tabs between DATE and HEIGHT instead of 1 tab

garbage
garbage
garbage
garbage
garbage
garbage
garbage
garbage
garbage
garbage
garbage
garbage


DATE        HEIGHT  INPUT   OUTPUT  TESTMEASURE
01/01/1933  NO RECORD   NO RECORD   MISSING MISSING
01/02/1933  NO RECORD   NO RECORD   MISSING MISSING

Doing fread(data) gives me:

fread(data)
   01/01/1933 NO RECORD NO RECORD MISSING MISSING
1: 01/02/1933 NO RECORD NO RECORD MISSING MISSING

Removing the extra tab between DATE and HEIGHT gives me:

         DATE    HEIGHT     INPUT  OUTPUT TESTMEASURE
1: 01/01/1933 NO RECORD NO RECORD MISSING     MISSING
2: 01/02/1933 NO RECORD NO RECORD MISSING     MISSING

Upvotes: 2

Related Questions