Reputation: 41

Creating a Bin column in Postgres to check an integer and return a string

I have a large data set in a Postgres db and need to generate a field that groups rows into a respective bin for "0-100", "101-200", "201-300", etc. all the way up to nearly 5000. I am aware that I could manually update each row and produce a line of code for each bin like this:

update test
   set testgroup = '0-100' where testint >= 1 and distance < 100;

I really would like to figure out a more efficient way to do this, open to anything and everything! The main goal is to look at the integer in this 'testint' column and then if it is in between 1-100 return in the testgroup column "0-100".

Upvotes: 1

Answers (3)

Eric M.

Reputation: 2967

all the way up to nearly 5000

Sometimes the issue is that the upper bound is unknown so width_bucket may not be ideal in such case as the upper bound is required.

But an old school modulo may be enough:

-- Explicit
SELECT  testint - testint % 100 || '-' || testint - testint % 100 + 100
FROM    (
    VALUES
    (256),
    (543),
    (33),
    (5611)
) AS q (testint)

-- Less duplicate operations
SELECT  left_end || '-' || left_end + 100
FROM  (
    VALUES
    (256),
    (543),
    (33),
    (5611)
) AS q (testint)
JOIN LATERAL (SELECT testint - testint % 100) l(left_end) ON TRUE;

Both return:

Upvotes: 0

bfris

Reputation: 5805

Use the width_bucket function. See the the docs, but here is a short version of the syntax:

width_bucket(a, LBound, UBound, num_bins)

To get it to work properly for your bins, I have to add 1 to UBound. Some examples:

select width_bucket( 1, 0, 5001, 50) gives 1
select width_bucket(100, 0, 5001, 0) gives 1
select width_bucket(101, 0, 5001, 50) gives 2
select width_bucket(4900, 0, 5001, 50) gives 49
select width_bucket(4901, 0, 5001, 50) gives 50

So that works as expected. Next we need to generate the proper string. Pseudo format is

(width_bucket - 1)*100 || '-' || (width_bucket)*100

Where || is the SQL concatenation operator. Using the first example from before:

select (width_bucket(1, 0, 5001, 50)-1)*100 || ' - ' || width_bucket(1, 0, 5001, 50)*100

gives '0 - 100'

Sweet. Now putting it all together. First make a sandbox table you can use for testing. This will be a copy or partial copy of your data:

CREATE TABLE test
AS
SELECT * 
FROM original_table

Then add the new column to the table:

ALTER TABLE test
  ADD COLUMN testgroup text

Now the UPDATE statement:

UPDATE test
   SET testgroup = width_bucket(testint, 0, 5001, 50)-1)*100 || ' - ' || 
                   width_bucket(testint, 0, 5001, 50)*100

Upvotes: 4

JGH

Reputation: 17846

You can make use of generate_series to generate numbers from 0 to 50, and then to select the data between the generated values * 100 and the next generated value * 100. The same principle is used to build the bin name.

UPDATE test
SET testgroup = (x*100)+1 || '-' || (x+1)*100
FROM generate_series(0,50) f(x)
WHERE testint > (x*100) 
  AND testint <= ((x+1)*100);

http://rextester.com/FXIS37706

Upvotes: 0

Creating a Bin column in Postgres to check an integer and return a string

Answers (3)

Related Questions