Reputation: 547
I have a fairly large .txt file ~9gb and I will like to load this txt file into postgres. The first row is the header, followed by all the data. If I postgres COPY the data directly, the header will cause an error that data type do not match with my postgres table, so I will need to remove it somehow.
Sample data: ProjectId,MailId,MailCodeId,prospectid,listid,datemailed,amount,donated,zip,zip4,VectorMajor,VectorMinor,packageid,phase,databaseid,amount2
15,53568419,89734,219906,15,2011-05-11 00:00:00,0,0,90720,2915,NonProfit,POLICY,230,3,1,0
16,84141863,87936,164657,243,2011-03-10 00:00:00,0,0,48362,2523,NonProfit,POLICY,1507,5,1,0
16,81442028,86632,15181625,243,2011-01-19 00:00:00,0,0,11501,2115,NonProfit,POLICY,1508,2,1,0
While the COPY function for postgres has the "header" setting that can ignore the first row, it only works for csv files:
copy training from 'C:/testCSV.csv' DELIMITER ',' csv header;
when I try to run the code above on my txt file, it gets an error:
copy training from 'C:/testTXTFile.txt' DELIMITER ',' csv header
ERROR: unquoted newline found in data
HINT: Use quoted CSV field to represent newline.
I have tried adding "quote" and "escape" attributes but the command just won't seem to work for txt file:
copy training from 'C:/testTXTFile.txt' DELIMITER ',' csv header quote as E'"' escape as E'\\N';
ERROR: COPY escape must be a single one-byte character
Alternatively, I thought about running java or create a seperate stagging table to remove the first row...but these solutions are expansive and time consuming. I will need to load 9gb of data just to remove the first row of headers... are there other solutions out there to remove the first row of a txt file easily so that I can load the data into my postgres database?
Upvotes: 46
Views: 55341
Reputation: 61
I've looked up docs at https://www.postgresql.org/docs/10/sql-copy.html
written about HEADER is not only true for CSV, but TSV also!
My solution was this in psql
\COPY mytable FROM 'mydata.tsv' DELIMITER E'\t' CSV HEADER;
(in addition mydata.tsv contaned header row which I excluded from copying to database table)
Upvotes: 3
Reputation: 4872
Use HEADER option with CSV option:
\copy <table_name> from '/source_file.csv' delimiter ',' CSV HEADER ;
HEADER Specifies that the file contains a header line with the names of each column in the file. On output, the first line contains the column names from the table, and on input, the first line is ignored. This option is allowed only when using CSV format.
Upvotes: 88