Reputation: 4799
I'm relatively new to Postgres, but I'm getting a strange error and Googling hasn't turned up anything.
I've created two (relatively large) tables with about 200 million rows each. The first has 4 rows, the second 3 rows. They're each storing varchars in the each column.
I'm trying to create a third table by joining the two tables and selecting three of the resulting columns. The query is:
create table table_C as
(select table_A.id as id, table_A.predicate, table_B.object as type
from table_A join table_B on
table_A.subject = table_B.subject);
The query runs for about 10 hours, then aborts saying
ERROR: could not extend file "base/446695/504075.302": No space left on device
HINT: Check free disk space.
I'm running this on an Amazon EC2 instance with 64GB of ram, and my database is stored on a volume with ~500GB free. I realize that the resulting table should be large, but the original two tables only take up less than 100GB so it would be bizarre if a table with two columns took up more than 5x more space. I've tried about 4 times, and have tried rebooting the instance.
I double checked that both the data directory and the pg_stats_tmp file are pointed to the correct volume. (Both the "data_dir" and "pg_stat_tmp" settings in postgres.conf).
Any thoughts? Is there some other temporary file that might be ballooning somewhere (the root volume has over 150GB free anyway)?
Upvotes: 3
Views: 4790
Reputation: 2409
If the subjects are duplicated, the join will repeat the rows combinations.
E.g.
Table1
Table2
Result will be 6 hours with Subject1 and 2 rows with Subject2 (8 rows in total)
In the most extreme case - subjects are totally identical - you'll get 200 mln * 200 mln rows.
I think it's the reason...
Upvotes: 1