Reputation: 101
I have two pandas dataframes (actual dataframes are much larger):
events = pd.DataFrame({'Begin':[959.44, 1222.82, 2217.59], 'End':[978.00,1240.41,2799.43]})
markers = pd.DataFrame({'Marker': [0, 256.0, 700, 975.33, 1188.2, 1230.88, 2500, 3120.22]})
I want to subdivide the events dataframe into marker, which I'm trying to treat like bins, that is, [0, 256.0], [256, 700], etc... Trying to end up with another row in the markers dataframe that accounts for how a cumulative total of events was observed from during that bin. Each of the events may end up in multiple bins. For example, the 959.44 to 978.00 event should have 15.89 (978.00-975.33) counted in the 700-975.33 bin and the rest should be counted in the 975.33,1188.2.
I've been trying to use pandas.cut to bin the markers dataframe, but I'm not sure how to account for multiple bins. is this the best way to do this?
Upvotes: 0
Views: 83
Reputation: 30605
IIUC you can use interval index to get the ranges later use get loc to get the marker value i.e
markers['Begin'] = markers.shift()
nm = markers.sort_index(1).dropna()
nm.index = pd.IntervalIndex.from_arrays(nm['Begin'], nm['Marker'])
events['mark'] = events['Begin'].apply(lambda x : nm.iloc[nm.index.get_loc(x)]['Marker'])
events['new'] = events['mark'] - events['Begin']
Output:
Begin End mark new 0 959.44 978.00 975.33 15.89 1 1222.82 1240.41 1230.88 8.06 2 2217.59 2799.43 2500.00 282.41
Explanation
Creating a interval index by shifting Marker
and droppping nan i.e
nm.index = pd.IntervalIndex.from_arrays(nm['Begin'], nm['Marker'])
Begin Marker (0.0, 256.0] 0.00 256.00 (256.0, 700.0] 256.00 700.00 (700.0, 975.33] 700.00 975.33 (975.33, 1188.2] 975.33 1188.20 (1188.2, 1230.88] 1188.20 1230.88 (1230.88, 2500.0] 1230.88 2500.00 (2500.0, 3120.22] 2500.00 3120.22
Search for the begin of events in the interval index then get the index by using get_loc later get the marker value for the index obtained i.e
Begin End mark 0 959.44 978.00 975.33 1 1222.82 1240.41 1230.88 2 2217.59 2799.43 2500.00
Later subtract the mark
from begin
to get the new column
Hope it helps.
Upvotes: 1