Reputation: 47
How do we add a new column of 'new feature' base on A column and Timeseries.
column A : Number 5, 8, 9 have same value
Timeseries(hh:mm:ms): 115312, 115313, 115314 have the almost the same value. (maybe we can set within the range of 3 minutes)
new feature: so they can put into a group, and we flag it as '1', others can flag it as '0'
In my original dataset, there are thousand and thousands of different values in column A, is there any simple way to make a new feature like the below picture?
A TImeseries new feature
1 1314 121314 1
2 2245 121415 1
3 1362 122241 0
4 3655 122512 0
5 4444 115312 1
6 1314 121414 1
7 2245 121515 1
8 4444 115313 1
9 4444 115314 1
10 2245 121615 1
Upvotes: 1
Views: 43
Reputation: 863301
Use:
#3 minutes threshold
N = pd.Timedelta(3 * 60, unit='s')
#convert times to timedeltas
s = df['TImeseries'].astype(str).str.replace('(\d{2})(\d{2})(\d{2})', r'\1:\2:\3')
df['TImeseries'] = pd.to_timedelta(s)
#sorting by both columns
df = df.sort_values(['A','TImeseries'])
#get difference per groups and compare by threshold
m1 = df.groupby('A')['TImeseries'].diff().fillna(pd.Timedelta(0)) < N
#get duplicates by A column
m2 = df['A'].duplicated(keep=False)
#chain by bitwise AND and convert to integers
df['new'] = (m1 & m2).view('i1')
#get default ordering of index
df = df.sort_index()
print (df)
A TImeseries new feature new
1 1314 12:13:14 1 1
2 2245 12:14:15 1 1
3 1362 12:22:41 0 0
4 3655 12:25:12 0 0
5 4444 11:53:12 1 1
6 1314 12:14:14 1 1
7 2245 12:15:15 1 1
8 4444 11:53:13 1 1
9 4444 11:53:14 1 1
10 2245 12:16:15 1 1
Upvotes: 1