Blue Moon
Blue Moon

Reputation: 189

How do i filter panda dataframe and keep only row based on other column and other conditions

Below is the dataframe I have as an example:

+--------------+-------+-------------+--------------+----------+-----------+
|      ID      | Part  | RequestFrom | QTYRequested | Location | QTYOnHand |
+--------------+-------+-------------+--------------+----------+-----------+
| PartACity 1  | PartA | City 1      |            1 | LocA     |         2 |
| PartACity 2  | PartA | City 2      |            1 | LocA     |         2 |
| PartACity 3  | PartA | City 3      |            1 | LocA     |         2 |
| PartACity 4  | PartA | City 4      |            1 | LocA     |         2 |
| PartACity 5  | PartA | City 5      |            1 | LocA     |         2 |
| PartACity 6  | PartA | City 6      |            1 | LocA     |         2 |
| PartACity 7  | PartA | City 7      |            1 | LocA     |         2 |
| PartACity 8  | PartA | City 8      |            1 | LocA     |         2 |
| PartACity 9  | PartA | City 9      |            1 | LocA     |         2 |
| PartACity 10 | PartA | City 10     |            1 | LocA     |         2 |
| PartACity 1  | PartA | City 1      |            1 | LocB     |         3 |
| PartACity 2  | PartA | City 2      |            1 | LocB     |         3 |
| PartACity 3  | PartA | City 3      |            1 | LocB     |         3 |
| PartACity 4  | PartA | City 4      |            1 | LocB     |         3 |
| PartACity 5  | PartA | City 5      |            1 | LocB     |         3 |
| PartACity 6  | PartA | City 6      |            1 | LocB     |         3 |
| PartACity 7  | PartA | City 7      |            1 | LocB     |         3 |
| PartACity 8  | PartA | City 8      |            1 | LocB     |         3 |
| PartACity 9  | PartA | City 9      |            1 | LocB     |         3 |
| PartACity 10 | PartA | City 10     |            1 | LocB     |         3 |
| PartACity 1  | PartA | City 1      |            1 | LocC     |         4 |
| PartACity 2  | PartA | City 2      |            1 | LocC     |         4 |
| PartACity 3  | PartA | City 3      |            1 | LocC     |         4 |
| PartACity 4  | PartA | City 4      |            1 | LocC     |         4 |
| PartACity 5  | PartA | City 5      |            1 | LocC     |         4 |
| PartACity 6  | PartA | City 6      |            1 | LocC     |         4 |
| PartACity 7  | PartA | City 7      |            1 | LocC     |         4 |
| PartACity 8  | PartA | City 8      |            1 | LocC     |         4 |
| PartACity 9  | PartA | City 9      |            1 | LocC     |         4 |
| PartACity 10 | PartA | City 10     |            1 | LocC     |         4 |
+--------------+-------+-------------+--------------+----------+-----------+

I want to turn the above dataframe into this:

+-------------+-------+-------------+--------------+----------+-----------+
|     ID      | Part  | RequestFrom | QTYRequested | Location | QTYOnHand |
+-------------+-------+-------------+--------------+----------+-----------+
| PartACity 1 | PartA | City 1      |            1 | LocA     |         2 |
| PartACity 2 | PartA | City 2      |            1 | LocA     |         2 |
| PartACity 3 | PartA | City 3      |            1 | LocB     |         3 |
| PartACity 4 | PartA | City 4      |            1 | LocB     |         3 |
| PartACity 5 | PartA | City 5      |            1 | LocB     |         3 |
| PartACity 6 | PartA | City 6      |            1 | LocC     |         4 |
| PartACity 7 | PartA | City 7      |            1 | LocC     |         4 |
| PartACity 8 | PartA | City 8      |            1 | LocC     |         4 |
| PartACity 9 | PartA | City 9      |            1 | LocC     |         4 |
+-------------+-------+-------------+--------------+----------+-----------+

As you can see, The total QTYOnHand are 9, but we have 10 open request for Part A.

I want to find a better way to allocate the quantity.

Since LocA only has two quantity of PartA, so we only keep the top two rows.

LocB has 3 quantity of PartA, the next 3 quantity will be allocated to LocB.

LocC has 4 quantity of PartA, the next 4 quantity will be allocated to LocC.

Any help would be greatly appreciated!!!

Upvotes: 4

Views: 102

Answers (1)

user5305519
user5305519

Reputation: 3206

Python 2.7.12 (v2.7.12:d33e0cf91556, Jun 27 2016, 15:24:40) [MSC v.1500 64 bit (AMD64)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> import pandas as pd
>>> df = pd.DataFrame({
    'ID' : ['PartACity 1', 'PartACity 2', 'PartACity 3', 'PartACity 4', 'PartACity 5', 'PartACity 6', 'PartACity 7', 'PartACity 8', 'PartACity 9', 'PartACity 10', 'PartACity 1', 'PartACity 2', 'PartACity 3', 'PartACity 4', 'PartACity 5', 'PartACity 6', 'PartACity 7', 'PartACity 8', 'PartACity 9', 'PartACity 10', 'PartACity 1', 'PartACity 2', 'PartACity 3', 'PartACity 4', 'PartACity 5', 'PartACity 6', 'PartACity 7', 'PartACity 8', 'PartACity 9', 'PartACity 10'],
    'Part' : ['PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA'],
    'RequestFrom': ['City 1', 'City 2', 'City 3', 'City 4', 'City 5', 'City 6', 'City 7', 'City 8', 'City 9', 'City 10', 'City 1', 'City 2', 'City 3', 'City 4', 'City 5', 'City 6', 'City 7', 'City 8', 'City 9', 'City 10', 'City 1', 'City 2', 'City 3', 'City 4', 'City 5', 'City 6', 'City 7', 'City 8', 'City 9', 'City 10'],
    'QTYRequested': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
    'Location': ['LocA', 'LocA', 'LocA', 'LocA', 'LocA', 'LocA', 'LocA', 'LocA', 'LocA', 'LocA', 'LocB', 'LocB', 'LocB', 'LocB', 'LocB', 'LocB', 'LocB', 'LocB', 'LocB', 'LocB', 'LocC', 'LocC', 'LocC', 'LocC', 'LocC', 'LocC', 'LocC', 'LocC', 'LocC', 'LocC'],
    'QTYOnHand': [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4]
    })
>>> print(df)
              ID Location     ...      QTYRequested  RequestFrom
0    PartACity 1     LocA     ...                 1       City 1
1    PartACity 2     LocA     ...                 1       City 2
2    PartACity 3     LocA     ...                 1       City 3
3    PartACity 4     LocA     ...                 1       City 4
4    PartACity 5     LocA     ...                 1       City 5
5    PartACity 6     LocA     ...                 1       City 6
6    PartACity 7     LocA     ...                 1       City 7
7    PartACity 8     LocA     ...                 1       City 8
8    PartACity 9     LocA     ...                 1       City 9
9   PartACity 10     LocA     ...                 1      City 10
10   PartACity 1     LocB     ...                 1       City 1
11   PartACity 2     LocB     ...                 1       City 2
12   PartACity 3     LocB     ...                 1       City 3
13   PartACity 4     LocB     ...                 1       City 4
14   PartACity 5     LocB     ...                 1       City 5
15   PartACity 6     LocB     ...                 1       City 6
16   PartACity 7     LocB     ...                 1       City 7
17   PartACity 8     LocB     ...                 1       City 8
18   PartACity 9     LocB     ...                 1       City 9
19  PartACity 10     LocB     ...                 1      City 10
20   PartACity 1     LocC     ...                 1       City 1
21   PartACity 2     LocC     ...                 1       City 2
22   PartACity 3     LocC     ...                 1       City 3
23   PartACity 4     LocC     ...                 1       City 4
24   PartACity 5     LocC     ...                 1       City 5
25   PartACity 6     LocC     ...                 1       City 6
26   PartACity 7     LocC     ...                 1       City 7
27   PartACity 8     LocC     ...                 1       City 8
28   PartACity 9     LocC     ...                 1       City 9
29  PartACity 10     LocC     ...                 1      City 10

[30 rows x 6 columns]

Duplicate df as temp_df to aggregate the quantity on hand and keep track of the quantity left for each location by creating a new column QTYLeft:

>>> temp_df = df
>>> temp_df = temp_df.groupby('Location').agg({'QTYOnHand':'first'})
>>> temp_df = temp_df.reset_index()
>>> temp_df['QTYLeft'] =temp_df['QTYOnHand']
>>> print(temp_df)
  Location  QTYOnHand  QTYLeft
0     LocA          2        2
1     LocB          3        3
2     LocC          4        4

Group df by ID, Part, RequestFrom:

>>> df = df.groupby(['ID', 'Part', 'RequestFrom']).first()
>>> df = df.reset_index()
>>> print(df)
             ID   Part     ...      QTYOnHand QTYRequested
0   PartACity 1  PartA     ...              2            1
1  PartACity 10  PartA     ...              2            1
2   PartACity 2  PartA     ...              2            1
3   PartACity 3  PartA     ...              2            1
4   PartACity 4  PartA     ...              2            1
5   PartACity 5  PartA     ...              2            1
6   PartACity 6  PartA     ...              2            1
7   PartACity 7  PartA     ...              2            1
8   PartACity 8  PartA     ...              2            1
9   PartACity 9  PartA     ...              2            1

[10 rows x 6 columns]

Values in ID column are strings and thus cannot be used as an index to sort according to ascending numbers, thus we create a new temporary index called temp_index first, sort the df in ascending order, then remove said index:

>>> df = df.assign(temp_index=[int(float(i.split(' ')[-1])) for i in df['ID']])
>>> df = df.sort_values(by='temp_index')
>>> print(df)
             ID   Part    ...     QTYRequested temp_index
0   PartACity 1  PartA    ...                1          1
2   PartACity 2  PartA    ...                1          2
3   PartACity 3  PartA    ...                1          3
4   PartACity 4  PartA    ...                1          4
5   PartACity 5  PartA    ...                1          5
6   PartACity 6  PartA    ...                1          6
7   PartACity 7  PartA    ...                1          7
8   PartACity 8  PartA    ...                1          8
9   PartACity 9  PartA    ...                1          9
1  PartACity 10  PartA    ...                1         10

[10 rows x 7 columns]
>>> del df['temp_index']

Create a new user-defined function (UDF) and apply it to allocate the available quantity per location, with the smaller indexes being allocated first as per your question:

>>> def allocate_qty(row):
    global temp_df
    try:
        temp_df = temp_df[(temp_df['QTYLeft'] != 0)]
        avail_qty = temp_df['QTYOnHand'].values[0]
        avail_location = temp_df['Location'].values[0]
        temp_df['QTYLeft'].values[0] = temp_df['QTYLeft'].values[0] - row['QTYRequested']
        return avail_location, avail_qty
    except:
        return 'Not Allocated', 0


>>> df['Location'], df['QTYOnHand'] = zip(*df.apply(allocate_qty, axis=1))
>>> print(df)
             ID   Part     ...      QTYOnHand QTYRequested
0   PartACity 1  PartA     ...              2            1
2   PartACity 2  PartA     ...              2            1
3   PartACity 3  PartA     ...              3            1
4   PartACity 4  PartA     ...              3            1
5   PartACity 5  PartA     ...              3            1
6   PartACity 6  PartA     ...              4            1
7   PartACity 7  PartA     ...              4            1
8   PartACity 8  PartA     ...              4            1
9   PartACity 9  PartA     ...              4            1
1  PartACity 10  PartA     ...              0            1

[10 rows x 6 columns]

Filter out rows which did not manage to be allocated the resources:

>>> df = df[(df['Location'] != 'Not Allocated')]
>>> print(df)
            ID   Part     ...      QTYOnHand QTYRequested
0  PartACity 1  PartA     ...              2            1
2  PartACity 2  PartA     ...              2            1
3  PartACity 3  PartA     ...              3            1
4  PartACity 4  PartA     ...              3            1
5  PartACity 5  PartA     ...              3            1
6  PartACity 6  PartA     ...              4            1
7  PartACity 7  PartA     ...              4            1
8  PartACity 8  PartA     ...              4            1
9  PartACity 9  PartA     ...              4            1

[9 rows x 6 columns]

Hope this helps!

Upvotes: 1

Related Questions