Ben
Ben

Reputation: 149

Copying untitled columns from tsv file to postgresql?

By tsv I mean a file delimited by tabs. I have a pretty large (6GB) data file that I have to import into a PostgreSQL database, and out of 56 columns, the first 8 are meaningful, then out of the other 48 there are several columns (like 7 or so) with 1's sparsely distributed with the rest being 0's. Is there a way to specify which columns in the file you want to copy into the table? If not, then I am fine with importing the whole file and just extracting the desired columns to use as data for my project, but I am concerned about allocating excessively large memory to a table in which less than 1/4 of the data is meaningful. Will this pose an issue, or will I be fine accommodating the meaningful columns into my table? I have considered using that table as a temp table and then importing the meaningful columns to another table, but I have been instructed to try to avoid doing an intermediary cleaning step, so I should be fine directly using the large table if it won't cause any problems in PostgreSQL.

Upvotes: 1

Views: 602

Answers (1)

Daniel Vérité
Daniel Vérité

Reputation: 61646

With PostgreSQL 9.3 or newer, COPY accepts a program as input . This option is precisely meant for that kind of pre-processing. For instance, to keep only tab-separated fields 1 to 4 and 7 from a TSV file, you could run:

COPY destination_table FROM PROGRAM 'cut -f1-4,7 /path/to/file' (format csv, delimiter '\t');

This also works with \copy in psql, in which case the program is executed client-side.

Upvotes: 2

Related Questions