Reputation: 2146
I have a log file like below
Source log file
OrderID,PathCode
1000,0
2000,1
3000,2
4000,1
5000,2
6000,0
7000,2
8000,1
9000,0
But Destination log file doesnt contain the PathCode it came from, it only knows the sequence as below.
OrderID
3000
9999 #Ignore this since not in source OrderID
1000
2000
5000
9991 #Ignore this since not in source OrderID
4000
6000
9000
8000
7000
9998 #Ignore this since not in source OrderID
How to add PathCode to the dest file. OrderID is always unique
Expected Output
OrderID,PathCode
3000,2
1000,0
2000,1
5000,2
4000,1
6000,0
9000,0
8000,1
7000,2
Basically I have to sort source file according to sequence given in destination file
Upvotes: 1
Views: 57
Reputation: 38415
You can use map
dest['pathcode'] = dest['OrderID'].map(source.set_index('OrderID')['PathCode'])
You get
OrderID pathcode
0 3000 2
1 1000 0
2 2000 1
3 5000 2
4 4000 1
5 6000 0
6 9000 0
7 8000 1
8 7000 2
To get the group of 3 rows and find the first pathcode try,
dest.iloc[0::3, 1].value_counts() #df.iloc[0::3] returns row no 0,3,6,9 etc
2 2
0 1
so pathcode 2 came first 2 times and 0, 1 time.
EDIT: I tried the code with your test case dest file. The map works as expected and you need to dropna but you also need to reindex for the slice to work.
dest.dropna(inplace=True)
dest.reset_index(drop=True, inplace=True)
dest.iloc[0::3, 1].value_counts()
You get
2 2
0 1
Upvotes: 1