Reputation: 1048
I have the below data:
per_id month_id area_grp area_hrs level
754 201803 WNY 19.93 OVue
754 201802 MT 0.52 OVue
754 201802 WBS 0.34 OVue
754 201801 WYW 33.04 OVue
754 201801 VC 16.62 OVue
754 201801 PX 3.05 OVue
754 201712 RZN 5.05 OVue
754 201711 WYW 17.85 OVue
754 201711 NLN 0.8 OVue
754 201711 DJr 45.67 OVue
754 201711 TC 11.81 OVue
754 201710 MBN 1.61 OVue
754 201709 WAD 4.72 OVue
754 201709 DJr 23.82 OVue
**147 201803 WBC 13.44 OVue
147 201803 WBS 4.17 OVue
147 201803 WYW 1.80 OVue
147 201802 VD 5.14 OVue
147 201802 LNY 0.05 OVue
147 201801 LEIF 6.94 OVue
147 201801 WBC 25.35 OVue
147 201801 VD 8.53 OVue**
147 201712 MVR 7.17 OSver
147 201711 WDA 9.89 OSver
147 201711 YS 1.65 OSver
147 201711 N1T 3.02 OSver
147 201710 LEIF 1.83 OSver
147 201710 Hi 3.15 OVue
147 201710 TLD 1.43 OVue
147 201710 WBC 21.56 OSver
147 201710 NTT 2.88 OSver
147 201710 WBS 5.64 OSver
147 201710 LNY 2.67 OSver
147 201710 BRV 2.91 OSver
147 201710 N1T 3.78 OSver
Above is a sample of my data , sorted on the 1st and columns here. This data resides in a table in postgresql and I construct a query based on a parameter level.
level_list = ['OSver','Ovue']
for i in range(0,len(level_list)):
sql_per_hours = str("""select * from table_name where level = %s ;""") \
%("'"+''.join(level_list[i]) + "'")
df_net = pd.read_sql(sql_per_hours,cnxn)
Now, for per_id 754 since he has worked under a single level all records for him should be fetched. But for people who have worked across more than 1 level. Only the latest level info should be taken. So for per_id 147 only the top 8 records must be selected to be fetched. latest level is the value in level for each per_id where the month_id is max. So from that row , till there is a break in the value, rows are taken.
How can I construct this sql query and call it from python? This will also help me reduce the data I am pulling for each level in level_list.
Expected Output as df_net:
per_id month_id area_grp area_hrs level
754 201803 WNY 19.93 OVue
754 201802 MT 0.52 OVue
754 201802 WBS 0.34 OVue
754 201801 WYW 33.04 OVue
754 201801 VC 16.62 OVue
754 201801 PX 3.05 OVue
754 201712 RZN 5.05 OVue
754 201711 WYW 17.85 OVue
754 201711 NLN 0.8 OVue
754 201711 DJr 45.67 OVue
754 201711 TC 11.81 OVue
754 201710 MBN 1.61 OVue
754 201709 WAD 4.72 OVue
754 201709 DJr 23.82 OVue
147 201803 WBC 13.44 OVue
147 201803 WBS 4.17 OVue
147 201803 WYW 1.80 OVue
147 201802 VD 5.14 OVue
147 201802 LNY 0.05 OVue
147 201801 LEIF 6.94 OVue
147 201801 WBC 25.35 OVue
147 201801 VD 8.53 OVue
Can someone please help me with this?
Upvotes: 0
Views: 29
Reputation: 81
This has more to do with SQL than python. First get the last level corresponding to each per_id
and use that information to get the required rows.
select table_name.* -- get all records for last level
from
table_name
left join
(select distinct per_id, level from -- get latest level
table_name tn
left join
(select
per_id,
max(month_id) as latest_month -- get latest month
from
table_name
group by per_id) sub1
on
tn.per_id = sub1.per_id
and tn.month_id = sub1.latest_month)sub2
on
table_name.per_id = sub2.per_id
and table_name.level = sub2.level
You can also explore using window functions.
Upvotes: 1