Shuvayan Das
Shuvayan Das

Reputation: 1048

pulling selective data from database server in python based on conditions

I have the below data:

per_id      month_id    area_grp    area_hrs    level
754         201803      WNY         19.93       OVue
754         201802      MT          0.52        OVue
754         201802      WBS         0.34        OVue
754         201801      WYW         33.04       OVue
754         201801      VC          16.62       OVue
754         201801      PX          3.05        OVue
754         201712      RZN         5.05        OVue
754         201711      WYW         17.85       OVue
754         201711      NLN         0.8         OVue
754         201711      DJr         45.67       OVue
754         201711      TC          11.81       OVue
754         201710      MBN         1.61        OVue
754         201709      WAD         4.72        OVue
754         201709      DJr         23.82       OVue

**147           201803      WBC         13.44       OVue
147         201803      WBS         4.17        OVue
147         201803      WYW         1.80        OVue
147         201802      VD          5.14        OVue
147         201802      LNY         0.05        OVue
147         201801      LEIF        6.94        OVue
147         201801      WBC         25.35       OVue
147         201801      VD          8.53        OVue**
147         201712      MVR         7.17        OSver
147         201711      WDA         9.89        OSver
147         201711      YS          1.65        OSver
147         201711      N1T         3.02        OSver
147         201710      LEIF        1.83        OSver
147         201710      Hi          3.15        OVue
147         201710      TLD         1.43        OVue
147         201710      WBC         21.56       OSver
147         201710      NTT         2.88        OSver
147         201710      WBS         5.64        OSver
147         201710      LNY         2.67        OSver
147         201710      BRV         2.91        OSver
147         201710      N1T         3.78        OSver

Above is a sample of my data , sorted on the 1st and columns here. This data resides in a table in postgresql and I construct a query based on a parameter level.

level_list = ['OSver','Ovue']

for i in range(0,len(level_list)):
    sql_per_hours = str("""select * from table_name where level = %s ;""") \
                        %("'"+''.join(level_list[i]) + "'") 
    df_net = pd.read_sql(sql_per_hours,cnxn)

Now, for per_id 754 since he has worked under a single level all records for him should be fetched. But for people who have worked across more than 1 level. Only the latest level info should be taken. So for per_id 147 only the top 8 records must be selected to be fetched. latest level is the value in level for each per_id where the month_id is max. So from that row , till there is a break in the value, rows are taken.

How can I construct this sql query and call it from python? This will also help me reduce the data I am pulling for each level in level_list.

Expected Output as df_net:

per_id      month_id    area_grp    area_hrs    level
754         201803      WNY         19.93       OVue
754         201802      MT          0.52        OVue
754         201802      WBS         0.34        OVue
754         201801      WYW         33.04       OVue
754         201801      VC          16.62       OVue
754         201801      PX          3.05        OVue
754         201712      RZN         5.05        OVue
754         201711      WYW         17.85       OVue
754         201711      NLN         0.8         OVue
754         201711      DJr         45.67       OVue
754         201711      TC          11.81       OVue
754         201710      MBN         1.61        OVue
754         201709      WAD         4.72        OVue
754         201709      DJr         23.82       OVue
147         201803      WBC         13.44       OVue
147         201803      WBS         4.17        OVue
147         201803      WYW         1.80        OVue
147         201802      VD          5.14        OVue
147         201802      LNY         0.05        OVue
147         201801      LEIF        6.94        OVue
147         201801      WBC         25.35       OVue
147         201801      VD          8.53        OVue

Can someone please help me with this?

Upvotes: 0

Views: 29

Answers (1)

Santhosh
Santhosh

Reputation: 81

This has more to do with SQL than python. First get the last level corresponding to each per_id and use that information to get the required rows.

select table_name.* -- get all records for last level
from 
    table_name
left join
    (select distinct per_id, level from -- get latest level
        table_name tn
    left join
        (select
            per_id,
            max(month_id) as latest_month -- get latest month
        from
            table_name 
        group by per_id) sub1
    on 
        tn.per_id = sub1.per_id 
        and tn.month_id = sub1.latest_month)sub2
on 
    table_name.per_id = sub2.per_id
    and table_name.level = sub2.level

You can also explore using window functions.

Upvotes: 1

Related Questions