Tommy
Tommy

Reputation: 515

Efficient/Pythonic way to Filter pandas DataFrame based on priority

I have below dataframe.

+-----------+----------+-----+
| InvoiceNo | ItemCode | Qty |
+-----------+----------+-----+
|  Inv-001  |     A    |  2  |
+-----------+----------+-----+
|  Inv-001  |     B    |  3  |
+-----------+----------+-----+
|  Inv-001  |     C    |  1  |
+-----------+----------+-----+
|  Inv-002  |     B    |  3  |
+-----------+----------+-----+
|  Inv-002  |     D    |  4  |
+-----------+----------+-----+
|  Inv-003  |     C    |  3  |
+-----------+----------+-----+
|  Inv-003  |     D    |  9  |
+-----------+----------+-----+
|  Inv-004  |     D    |  5  |
+-----------+----------+-----+
|  Inv-004  |     E    |  8  |
+-----------+----------+-----+
|  Inv-005  |     X    |  2  |
+-----------+----------+-----+

my task is to create an additional column Type based on the priority of the item occurrence.

eg: ItemCode A has 1st Priority. then B has 2nd priority and C has 3rd priority. rest of the items has least priority and classified has Other.

So, if any Invoice contains item A, the type should be Type - A irrespective other items presence. from the balance Invoices if item B contains, then the type should be Type - B. same for C. if none of A, B or C is not present in any invoice, then the type should be Type - Other.

Below is my desired output.

+-----------+----------+-----+--------------+
| InvoiceNo | ItemCode | Qty |     Type     |
+-----------+----------+-----+--------------+
|  Inv-001  |     A    |  2  |   Type - A   |
+-----------+----------+-----+--------------+
|  Inv-001  |     B    |  3  |   Type - A   |
+-----------+----------+-----+--------------+
|  Inv-001  |     C    |  1  |   Type - A   |
+-----------+----------+-----+--------------+
|  Inv-002  |     B    |  3  |   Type - B   |
+-----------+----------+-----+--------------+
|  Inv-002  |     D    |  4  |   Type - B   |
+-----------+----------+-----+--------------+
|  Inv-003  |     C    |  3  |   Type - C   |
+-----------+----------+-----+--------------+
|  Inv-003  |     D    |  9  |   Type - C   |
+-----------+----------+-----+--------------+
|  Inv-004  |     D    |  5  | Type - Other |
+-----------+----------+-----+--------------+
|  Inv-004  |     E    |  8  | Type - Other |
+-----------+----------+-----+--------------+
|  Inv-005  |     X    |  2  | Type - Other |
+-----------+----------+-----+--------------+

Below is my code and it works. But, it is more cumbersome and not pythonic at all.

# load Dataframe
df = pd.read_excel() 

# filter data containing `A`
mask_A = (df['ItemCode'] == 'A').groupby(df['InvoiceNo']).transform('any')
df_A = df[mask_A]
df_A['Type'] = 'Type - A'

# form the rest of the data, filter data containing `B`
df = df[~mask_A]
mask_B = (df['ItemCode'] == 'B').groupby(df['InvoiceNo']).transform('any')
df_B = df[mask_B]
df_B['Type'] = 'Type - B'

# form the rest of the data, filter data containing `c`
df = df[~mask_B]
mask_C = (df['ItemCode'] == 'C').groupby(df['InvoiceNo']).transform('any')
df_C = df[mask_C]
df_C['Type'] = 'Type - C'

# form the rest of the data, filter data doesnt contain `A, B or C`
df_Other = df[~mask_C]
df_Other['Type'] = 'Type - Other'

# Conctenate all the dataframes
df = pd.concat([df_A, df_B, df_C, df_Other], axis=0,sort=False)

Now, what is the most efficient and pythonic way to do this?

Upvotes: 1

Views: 91

Answers (1)

BENY
BENY

Reputation: 323226

I feel like we can do Categorical then transform

df['Type']=pd.Categorical(df.ItemCode,['A','B','C'],ordered=True)

df['Type']='Type_'+df.groupby('InvoiceNo')['Type'].transform('min').fillna('other')

Update

df['Type']=pd.Categorical(df.ItemCode,['A','B','C'],ordered=True)
df=df.sort_values('Type')
df['Type']='Type_'+df.groupby('InvoiceNo')['Type'].transform('first').fillna('other')
df=df.sort_index()

df
Out[32]: 
     InvoiceNo ItemCode  Qty        Type
0    Inv-001          A    2      Type_A
1    Inv-001          B    3      Type_A
2    Inv-001          C    1      Type_A
3    Inv-002          B    3      Type_B
4    Inv-002          D    4      Type_B
5    Inv-003          C    3      Type_C
6    Inv-003          D    9      Type_C
7    Inv-004          D    5  Type_other
8    Inv-004          E    8  Type_other
9    Inv-005          X    2  Type_other

Upvotes: 2

Related Questions