Reputation: 89
I have got a very large Pandas Data Frame
. A part of which looks like this:
Rule_Name Rule_Seq_No Condition Expression Type
Rule P 1 ID 19909 Action
Rule P 1 Type A Condition
Rule P 1 System B Condition
Rule P 2 ID 19608 Action
Rule P 2 Type A Condition
Rule P 2 System C Condition
Rule S 1 ID 19909 Action
Rule S 1 Type A Condition
Rule S 1 System M Condition
Rule S 2 ID 19608 Action
Rule S 2 Type C Condition
Rule S 2 System F Condition
This table contains some rules with sequence numbers.
I tried using different functions such as MERGE
, GROUP BY
, APPLY
but I am not getting the desired output.
The expected output should be something like this:
Rule_Name Rule_Seq_No Condition Action
Rule P 1 (Type=A)and(System=B) 19909
Rule P 2 (Type=A)and(System=C) 19608
Rule S 1 (Type=A)and(System=M) 19909
Rule S 2 (Type=A)and(System=F) 19608
For the same rule and the same sequence number and where the TYPE
is Condition
, I want to merge the rows. And, where the TYPE
is ACTION
, it should show in a separate column.
Upvotes: 2
Views: 44
Reputation: 862711
Use:
df1 = (df.assign(Condition = '(' + df['Condition'] + '=' + df['Expression'] + ')')
.groupby(['Rule_Name','Rule_Seq_No','Type'])
.agg({'Condition': 'and'.join, 'Expression':'first'})
.unstack()
.drop([('Condition','Action'), ('Expression','Condition')], axis=1)
.droplevel(axis=1, level=0)
.reset_index()
.rename_axis(None, axis=1))
print (df1)
Rule_Name Rule_Seq_No Condition Action
0 Rule P 1 (Type=A)and(System=B) 19909
1 Rule P 2 (Type=A)and(System=C) 19608
2 Rule S 1 (Type=A)and(System=M) 19909
3 Rule S 2 (Type=C)and(System=F) 19608
Explanation:
Condition
and Expression
with =
and add ()
GroupBy.agg
with join
and first
DataFrame.unstack
DataFrame.drop
with tuples, because MultiIndex
MultiIndex
by DataFrame.droplevel
DataFrame.reset_index
and DataFrame.rename_axis
EDIT:
Solution for oldier pandas versions (below 0.24+) with Index.droplevel
:
df1 = (df.assign(Condition = '(' + df['Condition'] + '=' + df['Expression'] + ')')
.groupby(['Rule_Name','Rule_Seq_No','Type'])
.agg({'Condition': 'and'.join, 'Expression':'first'})
.unstack()
.drop([('Condition','Action'), ('Expression','Condition')], axis=1))
df1.columns = df1.columns.droplevel(level=0)
df1 = df1.reset_index().rename_axis(None, axis=1)
print (df1)
Rule_Name Rule_Seq_No Condition Action
0 Rule P 1 (Type=A)and(System=B) 19909
1 Rule P 2 (Type=A)and(System=C) 19608
2 Rule S 1 (Type=A)and(System=M) 19909
3 Rule S 2 (Type=C)and(System=F) 19608
Upvotes: 5