Reputation: 5542
I have a .txt dataset where the first 12 lines are text followed by 2 blank rows and then the data
DATE HEIGHT INPUT OUTPUT TESTMEASURE
01/01/1933 NO RECORD NO RECORD MISSING MISSING
01/02/1933 NO RECORD NO RECORD MISSING MISSING
But when I do a
dat <- fread('data.txt'),
It skips 15 rows, and uses the first data line as column name for the imported dataset. It ignores the header line.
01/01/1933 NO RECORD NO RECORD MISSING MISSING
The skip parameter is not affecting what I import at all. How can I mention the row number which needs to be used as the column name. Alternatively I can rename the column names, but the first line of data shouldn't be ignored.
Input contains no \n. Taking this to be a filename to open
File opened, filesize is 0.001319 GB.
Memory mapping ... ok
Detected eol as \r\n (CRLF) in that order, the Windows standard.
Positioned on line 1 after skip or autostart
This line is the autostart and not blank so searching up for the last non-blank ... line 1
Detecting sep ... '\t'
Detected 5 columns. Longest stretch was from line 15 to line 30
Starting data input on line 15 (either column names or first row of data). First 10 characters: 01/01/1933
The line before starting line 15 is non-empty and will be ignored (it has too few or too many items to be column names or data): DATE HEIGHT INPUT OUTPUT TESTMEASURE the fields on line 15 are character fields. Treating as the column names.
Upvotes: 0
Views: 691
Reputation: 13581
You have 12 lines of text, 2 lines of spaces, and then your data. But I noticed extra whitespace between DATE
and HEIGHT
. So make a text file like this, where your data is tab-delimited, and add 2 tabs between DATE
and HEIGHT
instead of 1 tab
garbage
garbage
garbage
garbage
garbage
garbage
garbage
garbage
garbage
garbage
garbage
garbage
DATE HEIGHT INPUT OUTPUT TESTMEASURE
01/01/1933 NO RECORD NO RECORD MISSING MISSING
01/02/1933 NO RECORD NO RECORD MISSING MISSING
Doing fread(data)
gives me:
fread(data)
01/01/1933 NO RECORD NO RECORD MISSING MISSING
1: 01/02/1933 NO RECORD NO RECORD MISSING MISSING
Removing the extra tab between DATE
and HEIGHT
gives me:
DATE HEIGHT INPUT OUTPUT TESTMEASURE
1: 01/01/1933 NO RECORD NO RECORD MISSING MISSING
2: 01/02/1933 NO RECORD NO RECORD MISSING MISSING
Upvotes: 2