Richard
Richard

Reputation: 65550

Postgres: How to group a very large table and insert the results into a new table?

I am working in Postgres 9.6 with PostGIS. I have a very large table (about 30GB):

                                       Table "public.parcels"
    Column    |          Type           |                            Modifiers
--------------+-------------------------+------------------------------------------------------------------
 ogc_fid      | integer                 | not null default nextval('parcels_ogc_fid_seq'::regclass)
 wkb_geometry | geometry(Polygon,4326)  |
 county       | character varying       |

I would like to create a new table that contains one polygon per county, with the polygon being the union of that county's polygons. I have defined my new table as follows:

 gid      | integer                 | not null default nextval('newtable_gid_seq'::regclass)
 the_geom | geometry(Polygon,4326) |

I know how to insert one county at a time from the first table into the second table:

INSERT INTO by_county(the_geom) values 
  ((SELECT ST_Union(wkb_geometry) FROM parcels where county='Warwick'));

But how can I do this programmatically for all counties? My current approach is to write a Python script with psycopg2 that first retrieves the DISTINCT county names, and then runs the above command manually for each county: but I'm wondering if there's a less manual way.

As noted this is a large dataset, and I'm working on a machine with 16GB of RAM. So it may be that my one-command-at-a-time approach is the best after all?

Upvotes: 1

Views: 472

Answers (2)

Deepak Puthraya
Deepak Puthraya

Reputation: 1435

You can try using COPY command. You can give this a shot

psql -d <DATABASE> -c "copy (SELECT country, ST_Union(wkb_geometry) FROM parcels GROUP BY county) to STDOUT" | psql -d <DATABASE> -c "copy by_county from STDIN"

You will have to make sure the columns that you specify are in order.

Upvotes: 0

e4c5
e4c5

Reputation: 53754

Since ST_Union is an aggregate function you can do a GROUP BY in your select

INSERT INTO by_county(county, the_geom)
   SELECT country, ST_Union(wkb_geometry) FROM parcels GROUP BY county

As noted this is a large dataset, and I'm working on a machine with 16GB of RAM. So it may be that my one-command-at-a-time approach is the best after all?

Probably not because postgresql is very good at handling large datasets. As a rule of thumb, one big query will be faster than a collection of small queries.

Upvotes: 1

Related Questions