How can I generate big data sample for Postgresql using generate_series and random?

Question

I want to generate big data sample (almost 1 million records) for studying tuplesort.c's polyphase merge in postgresql, and I hope the schema as follows:

CREATE TABLE Departments (code VARCHAR(4), UNIQUE (code));
CREATE TABLE Towns (
  id SERIAL UNIQUE NOT NULL,
  code VARCHAR(10) NOT NULL, -- not unique
  article TEXT,
  name TEXT NOT NULL, -- not unique
  department VARCHAR(4) NOT NULL REFERENCES Departments (code),
  UNIQUE (code, department)
);

how to use generate_series and random for do it? thanks a lot!

Clodoaldo Neto · Accepted Answer

To insert one million rows into Towns

insert into towns (
    code, article, name, department
)
select
    left(md5(i::text), 10),
    md5(random()::text),
    md5(random()::text),
    left(md5(random()::text), 4)
from generate_series(1, 1000000) s(i)

Since id is a serial it is not necessary to include it.

Reference

left ( string text, n integer ) → text

Returns first n characters in the string, or when n is negative, returns all but last |n| characters.
left('abcde', 2) → ab

md5 ( text ) → text

Computes the MD5 hash of the argument, with the result written in hexadecimal.
md5('abc') → 900150983cd24fb0d6963f7d28e17f72

generate_series ( start integer, stop integer [, step integer ] ) → setof integer

Generates a series of values from start to stop, with a step size of step. step defaults to 1.

Refs

How can I generate big data sample for Postgresql using generate_series and random?

Answers (1)

Reference

Related Questions