Reputation: 515
I have below dataframe.
+-----------+----------+-----+
| InvoiceNo | ItemCode | Qty |
+-----------+----------+-----+
| Inv-001 | A | 2 |
+-----------+----------+-----+
| Inv-001 | B | 3 |
+-----------+----------+-----+
| Inv-001 | C | 1 |
+-----------+----------+-----+
| Inv-002 | B | 3 |
+-----------+----------+-----+
| Inv-002 | D | 4 |
+-----------+----------+-----+
| Inv-003 | C | 3 |
+-----------+----------+-----+
| Inv-003 | D | 9 |
+-----------+----------+-----+
| Inv-004 | D | 5 |
+-----------+----------+-----+
| Inv-004 | E | 8 |
+-----------+----------+-----+
| Inv-005 | X | 2 |
+-----------+----------+-----+
my task is to create an additional column Type
based on the priority of the item occurrence.
eg: ItemCode A
has 1st
Priority. then B
has 2nd
priority and C
has 3rd
priority. rest of the items has least
priority and classified has Other
.
So, if any Invoice contains item A
, the type should be Type - A
irrespective other items presence. from the balance Invoices if item B
contains, then the type should be Type - B
. same for C
. if none of A, B or C
is not present in any invoice, then the type should be Type - Other
.
Below is my desired output.
+-----------+----------+-----+--------------+
| InvoiceNo | ItemCode | Qty | Type |
+-----------+----------+-----+--------------+
| Inv-001 | A | 2 | Type - A |
+-----------+----------+-----+--------------+
| Inv-001 | B | 3 | Type - A |
+-----------+----------+-----+--------------+
| Inv-001 | C | 1 | Type - A |
+-----------+----------+-----+--------------+
| Inv-002 | B | 3 | Type - B |
+-----------+----------+-----+--------------+
| Inv-002 | D | 4 | Type - B |
+-----------+----------+-----+--------------+
| Inv-003 | C | 3 | Type - C |
+-----------+----------+-----+--------------+
| Inv-003 | D | 9 | Type - C |
+-----------+----------+-----+--------------+
| Inv-004 | D | 5 | Type - Other |
+-----------+----------+-----+--------------+
| Inv-004 | E | 8 | Type - Other |
+-----------+----------+-----+--------------+
| Inv-005 | X | 2 | Type - Other |
+-----------+----------+-----+--------------+
Below is my code and it works. But, it is more cumbersome and not pythonic
at all.
# load Dataframe
df = pd.read_excel()
# filter data containing `A`
mask_A = (df['ItemCode'] == 'A').groupby(df['InvoiceNo']).transform('any')
df_A = df[mask_A]
df_A['Type'] = 'Type - A'
# form the rest of the data, filter data containing `B`
df = df[~mask_A]
mask_B = (df['ItemCode'] == 'B').groupby(df['InvoiceNo']).transform('any')
df_B = df[mask_B]
df_B['Type'] = 'Type - B'
# form the rest of the data, filter data containing `c`
df = df[~mask_B]
mask_C = (df['ItemCode'] == 'C').groupby(df['InvoiceNo']).transform('any')
df_C = df[mask_C]
df_C['Type'] = 'Type - C'
# form the rest of the data, filter data doesnt contain `A, B or C`
df_Other = df[~mask_C]
df_Other['Type'] = 'Type - Other'
# Conctenate all the dataframes
df = pd.concat([df_A, df_B, df_C, df_Other], axis=0,sort=False)
Now, what is the most efficient
and pythonic
way to do this?
Upvotes: 1
Views: 91
Reputation: 323226
I feel like we can do Categorical
then transform
df['Type']=pd.Categorical(df.ItemCode,['A','B','C'],ordered=True)
df['Type']='Type_'+df.groupby('InvoiceNo')['Type'].transform('min').fillna('other')
Update
df['Type']=pd.Categorical(df.ItemCode,['A','B','C'],ordered=True)
df=df.sort_values('Type')
df['Type']='Type_'+df.groupby('InvoiceNo')['Type'].transform('first').fillna('other')
df=df.sort_index()
df
Out[32]:
InvoiceNo ItemCode Qty Type
0 Inv-001 A 2 Type_A
1 Inv-001 B 3 Type_A
2 Inv-001 C 1 Type_A
3 Inv-002 B 3 Type_B
4 Inv-002 D 4 Type_B
5 Inv-003 C 3 Type_C
6 Inv-003 D 9 Type_C
7 Inv-004 D 5 Type_other
8 Inv-004 E 8 Type_other
9 Inv-005 X 2 Type_other
Upvotes: 2