Reputation: 475
I have a data frame that has a bunch of data in it. I have a column that is a datetime64. The column looks like (this is a small chunk. I have a total of 650 rows):
Time
0 2019-09-04 23:44:52
1 2019-09-02 20:38:40
2 2019-09-02 18:06:50
3 2019-09-04 20:18:06
4 2019-09-03 00:32:32
I then have a list with some benchmark days that looks like:
2019-08-30 23:59:59
2019-08-31 23:59:59
2019-09-01 23:59:59
2019-09-02 23:59:59
2019-09-03 23:59:59
2019-09-04 23:59:59
2019-09-05 23:59:59
2019-09-06 23:59:59
2019-09-07 23:59:59
2019-09-08 23:59:59
2019-09-09 23:59:59
I want to check to see how many days in my column lay in between each checkmark on my list. I could not get things working using .count or .sum. I also tried just finding the number of entries in the column that were greater than or equal each item in my list but it is but I get 650 for all of them. Not sure what I am doing wrong. My last method looked something like:
count = []
i =0
for val in dates:
value = (df[column_name] >= val).count()
counts[i] = value
i = i+1
I would rather do something like
count = []
i =0
while i < len(dates)-1:
value = (df['Time'] >= dates[i] && df['Time'] < dates[i+1]).count()
counts[i] = value
i = i+1
value = (df['Time'] >= dates[i])
counts[i] = value
But I am not sure how to accomplish this. Some help would be greatly appreciated.
I would like to end up with a list counts that contains the count that lies in the ranges using the dates inside dates.
Just looking at the days I provided the output would be something like: Days between
2019-08-30 23:59:59 - 2019-08-31 23:59:59 = 0
2019-08-31 23:59:59 - 2019-09-01 23:59:59 = 0
2019-09-01 23:59:59 - 2019-09-02 23:59:59 = 0
2019-09-02 23:59:59 - 2019-09-03 23:59:59 = 2
2019-09-03 23:59:59 - 2019-09-04 23:59:59 = 1
2019-09-04 23:59:59 - 2019-09-05 23:59:59 = 2
2019-09-05 23:59:59 - 2019-09-06 23:59:59 = 0
2019-09-06 23:59:59 - 2019-09-07 23:59:59 = 0
2019-09-07 23:59:59 - 2019-09-08 23:59:59 = 0
2019-09-08 23:59:59 - 2019-09-09 23:59:59 = 0
greater than 2019-09-09 23:59:59 = 0
counts = [0,0,0,2,1,2,0,0,0,0]
Upvotes: 1
Views: 1686
Reputation: 188
Assuming that this is df:
dates
2019-08-30 23:59:59
2019-08-31 23:59:59
2019-09-01 23:59:59
2019-09-02 23:59:59
2019-09-03 23:59:59
2019-09-04 23:59:59
2019-09-05 23:59:59
2019-09-06 23:59:59
2019-09-07 23:59:59
2019-09-08 23:59:59
2019-09-09 23:59:59
And this is df2
Time
2019-09-04 23:44:52
2019-09-02 20:38:40
2019-09-02 18:06:50
2019-09-04 20:18:06
2019-09-03 00:32:32
Assuming that the dates are in datetime64, this is the code for counting dates. Basically it holds each date ranges per loop, then checks each time log on those date ranges if they fit in it.
holder = ''
day_dif = []
for index,row in df.iterrows():
i = 0
if holder == '':
holder = row['dates']
continue
for time in df2['Time']:
first_date = holder
end_date = row['dates']
if (time >= first_date) & (time <= end_date):
i += 1
day_dif.append(i)
print(i)
holder = row['dates']
Printed Output:
2019-08-30 23:59:00 2019-08-31 23:59:00 0
2019-08-31 23:59:00 2019-09-01 23:59:00 0
2019-09-01 23:59:00 2019-09-02 23:59:00 2
2019-09-02 23:59:00 2019-09-03 23:59:00 1
2019-09-03 23:59:00 2019-09-04 23:59:00 2
2019-09-04 23:59:00 2019-09-05 23:59:00 0
2019-09-05 23:59:00 2019-09-06 23:59:00 0
2019-09-06 23:59:00 2019-09-07 23:59:00 0
2019-09-07 23:59:00 2019-09-08 23:59:00 0
2019-09-08 23:59:00 2019-09-09 23:59:00 0
List output of day_dif list:
[0, 0, 2, 1, 2, 0, 0, 0, 0, 0]
Upvotes: 1