Reputation: 489
For each value('X' column) we have time frame window which defined as ['T1', 'T2']:
df = pd.DataFrame({'X': [1, 1, 1, 3, 3, 3],
'T1': ['2015-12-30 22:30:00',
'2015-12-30 22:30:15',
'2015-12-30 22:30:20',
'2015-11-30 23:40:00',
'2015-11-30 23:42:00',
'2015-11-30 23:40:00'],
'T2': ['2015-12-30 22:31:00',
'2015-12-30 22:30:45',
'2015-12-30 22:30:20',
'2015-11-30 23:43:00',
'2015-11-30 23:42:50',
'2015-11-30 23:44:00'],
})
X T1 T2
0 1 2015-12-30 22:30:00 2015-12-30 22:31:00
1 1 2015-12-30 22:30:15 2015-12-30 22:30:45
2 1 2015-12-30 22:30:20 2015-12-30 22:30:20
3 3 2015-11-30 23:40:00 2015-11-30 23:43:00
4 3 2015-11-30 23:42:00 2015-11-30 23:42:50
5 3 2015-11-30 23:40:00 2015-11-30 23:44:00
I want to count the same values('X') whose time frames inside(or equal) the time frame defined for this value, so I want to get next result:
df = pd.DataFrame({'X': [1, 1, 1, 3, 3, 3],
'T1': ['2015-12-30 22:30:00',
'2015-12-30 22:30:15',
'2015-12-30 22:30:20',
'2015-11-30 23:40:00',
'2015-11-30 23:42:00',
'2015-11-30 23:40:00'],
'T2': ['2015-12-30 22:31:00',
'2015-12-30 22:30:45',
'2015-12-30 22:30:20',
'2015-11-30 23:43:00',
'2015-11-30 23:42:50',
'2015-11-30 23:44:00'],
'count': [3, 2, 1, 2, 1, 3],
})
X T1 T2 count
0 1 2015-12-30 22:30:00 2015-12-30 22:31:00 3
1 1 2015-12-30 22:30:15 2015-12-30 22:30:45 2
2 1 2015-12-30 22:30:20 2015-12-30 22:30:20 1
3 3 2015-11-30 23:40:00 2015-11-30 23:43:00 2
4 3 2015-11-30 23:42:00 2015-11-30 23:42:50 1
3 2015-11-30 23:40:00 2015-11-30 23:44:00 3
Upvotes: 2
Views: 265
Reputation: 5502
You can do it with apply
.
Here is one way to do:
# Import module
import pandas as pd
# You dataframe
df = pd.DataFrame({ 'X': [1, 1, 1, 3, 3, 3],
'T1': ['2015-12-30 22:30:00',
'2015-12-30 22:30:15',
'2015-12-30 22:30:20',
'2015-11-30 23:40:00',
'2015-11-30 23:42:00',
'2015-11-30 23:40:00'],
'T2': ['2015-12-30 22:31:00',
'2015-12-30 22:30:45',
'2015-12-30 22:30:20',
'2015-11-30 23:43:00',
'2015-11-30 23:42:50',
'2015-11-30 23:44:00'],
})
# Convert columns to datetime object
df["T1"] = pd.to_datetime(df.T1, format="%Y-%m-%d %H:%M:%S")
df["T2"] = pd.to_datetime(df.T2, format="%Y-%m-%d %H:%M:%S")
# Function to comptute the number of element in the interval
def countIntervals(row):
return len(df[(df.X == row.X) & (df.T1 >= row.T1) & (df.T2 < row.T2)]) + 1
# Add new column
df["count"] = df.apply(countIntervals, axis=1)
print(df)
# X T1 T2 count
# 0 1 2015-12-30 22:30:00 2015-12-30 22:31:00 3
# 1 1 2015-12-30 22:30:15 2015-12-30 22:30:45 2
# 2 1 2015-12-30 22:30:20 2015-12-30 22:30:20 1
# 3 3 2015-11-30 23:40:00 2015-11-30 23:43:00 2
# 4 3 2015-11-30 23:42:00 2015-11-30 23:42:50 1
# 5 3 2015-11-30 23:40:00 2015-11-30 23:44:00 3
Upvotes: 2