John Doe
John Doe

Reputation: 89

How do I get this output?

I have got a very large Pandas Data Frame. A part of which looks like this:

 Rule_Name Rule_Seq_No  Condition Expression  Type   

Rule P     1            ID         19909       Action      
Rule P     1            Type       A           Condition   
Rule P     1            System     B           Condition   
Rule P     2            ID         19608       Action      
Rule P     2            Type       A           Condition  
Rule P     2            System     C           Condition   
Rule S     1            ID         19909       Action      
Rule S     1            Type       A           Condition   
Rule S     1            System     M           Condition   
Rule S     2            ID         19608       Action     
Rule S     2            Type       C           Condition   
Rule S     2            System     F           Condition 

This table contains some rules with sequence numbers.

I tried using different functions such as MERGE, GROUP BY, APPLY but I am not getting the desired output.

The expected output should be something like this:

 Rule_Name  Rule_Seq_No        Condition          Action  

Rule P       1            (Type=A)and(System=B)    19909   
Rule P       2            (Type=A)and(System=C)    19608   
Rule S       1            (Type=A)and(System=M)    19909   
Rule S       2            (Type=A)and(System=F)    19608 

For the same rule and the same sequence number and where the TYPE is Condition, I want to merge the rows. And, where the TYPE is ACTION, it should show in a separate column.

Upvotes: 2

Views: 44

Answers (1)

jezrael
jezrael

Reputation: 862711

Use:

df1 = (df.assign(Condition = '(' + df['Condition'] + '=' + df['Expression'] + ')')
         .groupby(['Rule_Name','Rule_Seq_No','Type'])
         .agg({'Condition': 'and'.join, 'Expression':'first'})
         .unstack()
         .drop([('Condition','Action'), ('Expression','Condition')], axis=1)
         .droplevel(axis=1, level=0)
         .reset_index()
         .rename_axis(None, axis=1))
print (df1)
  Rule_Name  Rule_Seq_No              Condition Action
0    Rule P            1  (Type=A)and(System=B)  19909
1    Rule P            2  (Type=A)and(System=C)  19608
2    Rule S            1  (Type=A)and(System=M)  19909
3    Rule S            2  (Type=C)and(System=F)  19608

Explanation:

  1. Join columns Condition and Expression with = and add ()
  2. Aggreagate by GroupBy.agg with join and first
  3. Reshape by DataFrame.unstack
  4. Remove unnecessary columns by DataFrame.drop with tuples, because MultiIndex
  5. Remove top level of MultiIndex by DataFrame.droplevel
  6. Data cleaning by DataFrame.reset_index and DataFrame.rename_axis

EDIT:

Solution for oldier pandas versions (below 0.24+) with Index.droplevel:

df1 = (df.assign(Condition = '(' + df['Condition'] + '=' + df['Expression'] + ')')
         .groupby(['Rule_Name','Rule_Seq_No','Type'])
         .agg({'Condition': 'and'.join, 'Expression':'first'})
         .unstack()
         .drop([('Condition','Action'), ('Expression','Condition')], axis=1))

df1.columns = df1.columns.droplevel(level=0)
df1 = df1.reset_index().rename_axis(None, axis=1)
print (df1)
  Rule_Name  Rule_Seq_No              Condition Action
0    Rule P            1  (Type=A)and(System=B)  19909
1    Rule P            2  (Type=A)and(System=C)  19608
2    Rule S            1  (Type=A)and(System=M)  19909
3    Rule S            2  (Type=C)and(System=F)  19608

Upvotes: 5

Related Questions