Reputation: 149
Hi I have a large dataset in excel (and it will keep growing) of products' orders.
The dataset look like this
Product Date Lsat24 Next24 Summary Buyer
*day1*
AX1 | 2/1/2019 |Checking inventory |Invoicing |The product request started today | HTN
X5L | 2/1/2019 |Inventory counting |Future request planing|No operation update | null
NO6 | 2/1/2019 |Request in-process |Closing request |Operation product request ended | KLW
ZS5 | 2/1/2019 |Order placed |Processing request |Request started for the product | WHM
*day2*
AX1 | 3/1/2019 |Invoicing the order |Logistic planning |Product requested still pending | HTN
X5L | 3/1/2019 |New request planing |processing order |A new product request started | LKJ
NO6 | 3/1/2019 |Closing request |No new request |No update for new operation | null
ZS5 | 3/1/2019 |Request ended |New order placed |Product inventory counting | WHM
*day3*
AX1 | 4/1/2019 |Logistic planning |Closing the request |Product request ended | HTN
X5L | 4/1/2019 |Request processing |Planning logistic |The product request under-process | LKJ
NO6 | 4/1/2019 |No update |New Request Planing |No update for new operation | null
ZS5 | 4/1/2019 |New order started |Request invoicing |Inventory and logistic planing | KLW
My ultimate goal is to get the percentage of each buyer orders. And that would mean I need to look for the world in 3 columns (Last24, Next 24 and Summary) So to get that I need to make sure that if Product == Product and Buyer == Buyer then search Last24 or Next24 or Summary for string 'ended'
The result would look like this table
Product Date Lsat24 Next24 Summary Buyer
*day1*
NO6 | 2/1/2019 |Request in-process |Closing request |Operation product request ended | KLW
*day2*
ZS5 | 3/1/2019 |Request ended |New order placed |Product inventory counting | WHM
*day3*
AX1 | 4/1/2019 |Logistic planning |Closing the request |Product request ended | HTN
So the real data table (in excel) has over 200 products. These products will be repeated everyday. The Last24, Next24 and summary for each product is different every day. Also the buyer can change(Not always).
I am using Python3.8.
Edit"
To make my question clear I will use examples.
Product Date Lsat24 Next24 Summary Buyer
*day1*
0 AX1 | 2/1/2019 |Checking inventory |Invoicing |The product request started today | HTN
1 X5L | 2/1/2019 |Inventory counting |Future request planing|No operation update | null
2 NO6 | 2/1/2019 |Request in-process |Closing request |Operation product request ended | KLW
3 ZS5 | 2/1/2019 |Order placed |Processing request |Request started for the product | WHM
*day2*
4 AX1 | 3/1/2019 |Invoicing the order |Logistic planning |Product requested still pending | HTN
5 X5L | 3/1/2019 |New request planing |processing order |A new product request started | LKJ
6 NO6 | 3/1/2019 |Closing request |No new request |No update for new operation | null
7 ZS5 | 3/1/2019 |Request ended |New order placed |Product inventory counting | WHM
*day3*
8 AX1 | 4/1/2019 |Logistic planning |Closing the request |Product request ended | HTN
9 X5L | 4/1/2019 |Request processing |Planning logistic |The product request under-process | LKJ
10 NO6 | 4/1/2019 |No update |New Request Planing |No update for new operation | null
11 ZS5 | 4/1/2019 |New order started |Request invoicing |Inventory and logistic planing | KLW
One case
*if** Product == Product ([1] X5L == [5] X5L) then go for Buyer == Buyer ([1] null != [5] LKJ(since they do not match) search in row [1] for "ended" in Last24 or Next24 or Summary, if "ended" is found in at least one of the 3 columns return the row[1] else remove it)
The other case
*if** Product == Product ([0] AX1 == [4] AX1), then go for Buyer == Buyer ([0]HTN == [4]HTN, since they match go for the next row with the same product name. Product == Product ([4]AX1 ==[8] AX1), then Buyer == Buyer ([4]HTN == [8]HTN), since they match and no new row with the same product name, search in row [8] for "ended" in Last24 or Next24 or Summary).
It is a bit complicated, Iknow.
Upvotes: 1
Views: 76
Reputation: 863741
I think you need Series.str.endswith
for looking for string in 3 columns with DataFrame.any
for test at least one True
- one match per row and filter by boolean indexing
:
cols = ['Lsat24','Next24','Summary']
df = df[df[cols].apply(lambda x : x.str.endswith('ended')).any(axis=1)]
print (df)
Product Date Lsat24 Next24 \
2 NO6 2/1/2019 Request in-process Closing request
7 ZS5 3/1/2019 Request ended New order placed
8 AX1 4/1/2019 Logistic planning Closing the request
Summary Buyer
2 Operation product request ended KLW
7 Product inventory counting WHM
8 Product request ended HTN
Upvotes: 4