Hsu.George
Hsu.George

Reputation: 47

how to add a new feature base on other columns with same value

How do we add a new column of 'new feature' base on A column and Timeseries.

column A : Number 5, 8, 9 have same value

Timeseries(hh:mm:ms): 115312, 115313, 115314 have the almost the same value. (maybe we can set within the range of 3 minutes)

new feature: so they can put into a group, and we flag it as '1', others can flag it as '0'

In my original dataset, there are thousand and thousands of different values in column A, is there any simple way to make a new feature like the below picture?

      A  TImeseries new feature
1   1314    121314     1
2   2245    121415     1
3   1362    122241     0
4   3655    122512     0
5   4444    115312     1
6   1314    121414     1
7   2245    121515     1
8   4444    115313     1
9   4444    115314     1
10  2245    121615     1

Upvotes: 1

Views: 43

Answers (1)

jezrael
jezrael

Reputation: 863301

Use:

#3 minutes threshold
N = pd.Timedelta(3 * 60, unit='s')
#convert times to timedeltas
s = df['TImeseries'].astype(str).str.replace('(\d{2})(\d{2})(\d{2})', r'\1:\2:\3')
df['TImeseries'] = pd.to_timedelta(s)
#sorting by both columns
df = df.sort_values(['A','TImeseries'])

#get difference per groups and compare by threshold
m1 = df.groupby('A')['TImeseries'].diff().fillna(pd.Timedelta(0)) < N
#get duplicates by A column
m2 = df['A'].duplicated(keep=False)

#chain by bitwise AND and convert to integers
df['new'] = (m1 & m2).view('i1')
#get default ordering of index
df = df.sort_index()
print (df)
       A TImeseries  new feature  new
1   1314   12:13:14            1    1
2   2245   12:14:15            1    1
3   1362   12:22:41            0    0
4   3655   12:25:12            0    0
5   4444   11:53:12            1    1
6   1314   12:14:14            1    1
7   2245   12:15:15            1    1
8   4444   11:53:13            1    1
9   4444   11:53:14            1    1
10  2245   12:16:15            1    1

Upvotes: 1

Related Questions