new_programmer_22
new_programmer_22

Reputation: 475

Using .count on datetime in pandas dataframe

I have a data frame that has a bunch of data in it. I have a column that is a datetime64. The column looks like (this is a small chunk. I have a total of 650 rows):

Time
0      2019-09-04 23:44:52
1      2019-09-02 20:38:40
2      2019-09-02 18:06:50
3      2019-09-04 20:18:06
4      2019-09-03 00:32:32

I then have a list with some benchmark days that looks like:

2019-08-30 23:59:59
2019-08-31 23:59:59
2019-09-01 23:59:59
2019-09-02 23:59:59
2019-09-03 23:59:59
2019-09-04 23:59:59
2019-09-05 23:59:59
2019-09-06 23:59:59
2019-09-07 23:59:59
2019-09-08 23:59:59
2019-09-09 23:59:59

I want to check to see how many days in my column lay in between each checkmark on my list. I could not get things working using .count or .sum. I also tried just finding the number of entries in the column that were greater than or equal each item in my list but it is but I get 650 for all of them. Not sure what I am doing wrong. My last method looked something like:

count = []
i =0
for val in dates:
   value = (df[column_name] >= val).count()
   counts[i] = value
   i = i+1

I would rather do something like

count = []
i =0
while i < len(dates)-1:
   value = (df['Time'] >= dates[i] && df['Time'] < dates[i+1]).count()
   counts[i] = value
   i = i+1
value = (df['Time'] >= dates[i])
counts[i] = value

But I am not sure how to accomplish this. Some help would be greatly appreciated.

I would like to end up with a list counts that contains the count that lies in the ranges using the dates inside dates.

Just looking at the days I provided the output would be something like: Days between

2019-08-30 23:59:59 - 2019-08-31 23:59:59 = 0
2019-08-31 23:59:59 - 2019-09-01 23:59:59 = 0
2019-09-01 23:59:59 - 2019-09-02 23:59:59 = 0
2019-09-02 23:59:59 - 2019-09-03 23:59:59 = 2
2019-09-03 23:59:59 - 2019-09-04 23:59:59 = 1
2019-09-04 23:59:59 - 2019-09-05 23:59:59 = 2
2019-09-05 23:59:59 - 2019-09-06 23:59:59 = 0
2019-09-06 23:59:59 - 2019-09-07 23:59:59 = 0
2019-09-07 23:59:59 - 2019-09-08 23:59:59 = 0
2019-09-08 23:59:59 - 2019-09-09 23:59:59 = 0
greater than 2019-09-09 23:59:59 = 0

counts = [0,0,0,2,1,2,0,0,0,0]

Upvotes: 1

Views: 1686

Answers (1)

AminoAcid
AminoAcid

Reputation: 188

Assuming that this is df:

dates
2019-08-30 23:59:59
2019-08-31 23:59:59
2019-09-01 23:59:59
2019-09-02 23:59:59
2019-09-03 23:59:59
2019-09-04 23:59:59
2019-09-05 23:59:59
2019-09-06 23:59:59
2019-09-07 23:59:59
2019-09-08 23:59:59
2019-09-09 23:59:59

And this is df2

Time
2019-09-04 23:44:52
2019-09-02 20:38:40
2019-09-02 18:06:50
2019-09-04 20:18:06
2019-09-03 00:32:32

Assuming that the dates are in datetime64, this is the code for counting dates. Basically it holds each date ranges per loop, then checks each time log on those date ranges if they fit in it.

holder = ''
day_dif = []

for index,row in df.iterrows():
    i = 0
    if holder == '':
        holder = row['dates']
        continue

    for time in df2['Time']:

        first_date = holder
        end_date = row['dates']

        if (time >= first_date) & (time <= end_date):
            i += 1

    day_dif.append(i)
    print(i)
    holder = row['dates']

Printed Output:

2019-08-30 23:59:00 2019-08-31 23:59:00 0
2019-08-31 23:59:00 2019-09-01 23:59:00 0
2019-09-01 23:59:00 2019-09-02 23:59:00 2
2019-09-02 23:59:00 2019-09-03 23:59:00 1
2019-09-03 23:59:00 2019-09-04 23:59:00 2
2019-09-04 23:59:00 2019-09-05 23:59:00 0
2019-09-05 23:59:00 2019-09-06 23:59:00 0
2019-09-06 23:59:00 2019-09-07 23:59:00 0
2019-09-07 23:59:00 2019-09-08 23:59:00 0
2019-09-08 23:59:00 2019-09-09 23:59:00 0

List output of day_dif list:

[0, 0, 2, 1, 2, 0, 0, 0, 0, 0]

Upvotes: 1

Related Questions