Reputation: 13
I'm trying to run a Python (pandas) script that returns the initial_order_DT as a column and initial_order_value as another column.
Input:
Order ID Customer ID Value Date
00000001 0001 1500 10/01/2017
00000002 0001 1300 11/15/2017
00000003 0001 1200 12/12/2017
00000004 0002 500 12/13/2017
00000005 0003 1750 09/15/2017
00000006 0003 1800 10/01/2017
00000007 0003 1600 11/01/2017
00000008 0003 1400 12/01/2017
Output:
Order ID Customer ID Value Date First_Order_DT First_Value
00000001 0001 1500 // 10/01/2017 1500
00000002 0001 1300 // 10/01/2017 1500
00000003 0001 1200 // 10/01/2017 1500
00000004 0002 500 // 12/13/2017 500
00000005 0003 1750 // 09/15/2017 1750
00000006 0003 1800 // 09/15/2017 1750
00000007 0003 1600 // 09/15/2017 1750
00000008 0003 1400 etc 09/15/2017 1750
Upvotes: 0
Views: 641
Reputation: 402493
You can do this with groupby
+ transform
-
v = df.groupby('Customer ID')['Date', 'Value'].transform('first').add_prefix('First_')
v
First_Date First_Value
0 10/01/2017 1500
1 10/01/2017 1500
2 10/01/2017 1500
3 12/13/2017 500
4 09/15/2017 1750
5 09/15/2017 1750
6 09/15/2017 1750
7 09/15/2017 1750
Concatenate v
with the original df
-
df = pd.concat([df, v], 1)
df
Order ID Customer ID Value Date First_Date First_Value
0 1 1 1500 10/01/2017 10/01/2017 1500
1 2 1 1300 11/15/2017 10/01/2017 1500
2 3 1 1200 12/12/2017 10/01/2017 1500
3 4 2 500 12/13/2017 12/13/2017 500
4 5 3 1750 09/15/2017 09/15/2017 1750
5 6 3 1800 10/01/2017 09/15/2017 1750
6 7 3 1600 11/01/2017 09/15/2017 1750
7 8 3 1400 12/01/2017 09/15/2017 1750
Upvotes: 2