Blue Moon
Blue Moon

Reputation: 189

Pandas: Adding new rows depending on a group aggregation

Please help!

There are a total of 5 Rejection Codes: EL1, EL2, EL3, EL4, and EL5. I want to append new rows so that each ID has 5 reject codes always.

Here's my original DF:

+----+-------------+-----+
| ID | Reject Code | QTY |
+----+-------------+-----+
| A  | EL1         |   7 |
| A  | EL2         |   2 |
| A  | EL3         |  33 |
| B  | EL1         |   7 |
| B  | EL2         |   1 |
| B  | EL3         |   7 |
| B  | EL4         |  36 |
| B  | EL5         |   5 |
| C  | EL1         |   3 |
| C  | EL2         |  32 |
+----+-------------+-----+

ID B has all the rejection codes, so we don't need to add any row to this.

ID A only has EL1, EL2, and EL3, so how can I append 2 new rows and set EL4 and EL5 as 0 QTY?

The same with ID C, I need to append 3 rows, EL3, EL4, and EL5?

I want to output to be:

+----+-------------+-----+
| ID | Reject Code | QTY |
+----+-------------+-----+
| A  | EL1         |   7 |
| A  | EL2         |   2 |
| A  | EL3         |  33 |
| A  | EL4         |   0 |
| A  | EL5         |   0 |
| B  | EL1         |   7 |
| B  | EL2         |   1 |
| B  | EL3         |   7 |
| B  | EL4         |  36 |
| B  | EL5         |   5 |
| C  | EL1         |   3 |
| C  | EL2         |  32 |
| C  | EL3         |   0 |
| C  | EL4         |   0 |
| C  | EL5         |   0 |
+----+-------------+-----+

Upvotes: 1

Views: 69

Answers (2)

Scott Boston
Scott Boston

Reputation: 153460

Let's use pd.MultiIndexes and from_products to create missing rows:

indx = pd.MultiIndex.from_product([df['ID'].unique(), 
                                   df['Reject Code'].unique()], 
                                  names=['ID', 'Reject Code'])

df.set_index(['ID','Reject Code']).reindex(indx, fill_value=0).reset_index()

Output:

      ID    Reject Code  QTY
0    A     EL1             7
1    A     EL2             2
2    A     EL3            33
3    A     EL4             0
4    A     EL5             0
5    B     EL1             7
6    B     EL2             1
7    B     EL3             7
8    B     EL4            36
9    B     EL5             5
10   C     EL1             3
11   C     EL2            32
12   C     EL3             0
13   C     EL4             0
14   C     EL5             0

Of not, all 'Reject Code' are in dataframe, then you can use a list for df['Reject Codes'].unqiue(),

indx = pd.MultiIndex.from_product([df['ID'].unique(), ['EL1', 'EL2', 'EL3', 'EL4', 'EL5'], names=['ID', 'Reject Code'])

Upvotes: 3

Quang Hoang
Quang Hoang

Reputation: 150745

You can do pivot_table, then melt/stack:

# all reject codes
Rej_Codes = [f'EL{i+1}' for i in range(5)]

(df.pivot_table(index='ID', 
               columns='Reject Code', 
               values='QTY', 
               fill_value=0)
   .reindex(Rej_Codes, axis=1, fill_value=0) # as pointed out by rafaelc
   .stack()
   .reset_index(name='QTY')
)

Output:

   ID Reject Code  QTY
0   A         EL1    7
1   A         EL2    2
2   A         EL3   33
3   A         EL4    0
4   A         EL5    0
5   B         EL1    7
6   B         EL2    1
7   B         EL3    7
8   B         EL4   36
9   B         EL5    5
10  C         EL1    3
11  C         EL2   32
12  C         EL3    0
13  C         EL4    0
14  C         EL5    0

Upvotes: 5

Related Questions