Reputation: 1129
I currently have a pandas DataFrame that looks similar to this:
Data
2018-07-02 15.60
2018-07-03 16.14
2018-07-09 12.69
2018-07-17 12.06
2018-07-19 12.87
2018-07-27 13.03
2018-08-01 13.15
2018-08-02 12.19
2018-08-07 10.93
2018-08-09 11.27
2018-08-10 13.16
2018-08-16 13.45
2018-08-17 12.64
2018-08-20 12.49
2018-08-21 12.86
2018-08-22 12.25
I also have a list of dates:
dates = [datetime.datetime(2018, 8, 20, 0, 0),
datetime.datetime(2018, 8, 1, 0, 0),
datetime.datetime(2018, 7, 2, 0, 0)]
I am looking to add another column to my DataFrame called 'Count'. It starts the count from the latest date in dates list and resets it on when datetimeindex in my DataFrame matches next date in the list. The end result should look like this:
Data Count
2018-07-02 15.60 1
2018-07-03 16.14 6
2018-07-09 12.69 5
2018-07-17 12.06 4
2018-07-19 12.87 3
2018-07-27 13.03 2
2018-08-01 13.15 1
2018-08-02 12.19 7
2018-08-07 10.93 6
2018-08-09 11.27 5
2018-08-10 13.16 4
2018-08-16 13.45 3
2018-08-17 12.64 2
2018-08-20 12.49 1
What is the most efficient way to do it on a large DataFrame?
Upvotes: 1
Views: 544
Reputation: 282
Write a function which finds the minimum difference of time events between the DateTimeIndex and the dates list. The apply map on the index of the dataframe to obtain the new column.
You can use the below code in appropriate manner to your requirements.
def time_diff(i):
dates = [datetime.datetime(2018, 8, 20, 0, 0),
datetime.datetime(2018, 8, 1, 0, 0),
datetime.datetime(2018, 7, 2, 0, 0)]
diff_values = [ j - i for j in dates if j - i[0] != '-']
if len(diff_values) > 0:
return min(diff_values)
dataframe["Count"] = dataframe.index.map(time_diff)
Upvotes: 0
Reputation: 862591
Use GroupBy.cumcount
by helper Series
created by isin
and cumsum
and for swap ordering add [::-1]
:
s = pd.Series(df.index.isin(dates), index=df.index)[::-1].cumsum()
df['Count'] = df.groupby(s).cumcount(ascending=False) + 1
print (df)
Data Count
2018-07-02 15.60 1
2018-07-03 16.14 6
2018-07-09 12.69 5
2018-07-17 12.06 4
2018-07-19 12.87 3
2018-07-27 13.03 2
2018-08-01 13.15 1
2018-08-02 12.19 7
2018-08-07 10.93 6
2018-08-09 11.27 5
2018-08-10 13.16 4
2018-08-16 13.45 3
2018-08-17 12.64 2
2018-08-20 12.49 1
Upvotes: 2