Carlo Allocca
Carlo Allocca

Reputation: 639

Labelling the rows of a Dataframe

I would like to add a new attribute "LabelId" to my dataframe based on the following conditions: given the input dataframe DF, find the k nlargest(k, velocity) and for each value I would like add and substruc some ms e.g. 50 by applying the apply(lambda x: x - np.timedelta64( 50, 'ms' ) ) and apply(lambda x: x + np.timedelta64( 50, 'ms' ) ) and set the new attribute LabelId to 1 otherwise to 0.

Input DF: considering 2 largest values of the velocity column and adding 50 ms.

    eventTime,               velocity
1,  2017-08-19T12:53:55:050,  3
2,  2017-08-19T12:53:55:100,  4
3,  2017-08-19T12:53:55:150, 180
4,  2017-08-19T12:53:55:200,  2
5,  2017-08-19T12:53:55:250,  5
6,  2017-08-19T12:53:55:050,  3
7,  2017-08-19T12:53:55:100,  4
8,  2017-08-19T12:53:55:150, 70
9,  2017-08-19T12:53:55:200,  2
10, 2017-08-19T12:53:55:250,  5

Output DF:

    eventTime,               velocity, LabelId
1,  2017-08-19T12:53:55:050,  3,        0
2,  2017-08-19T12:53:55:100,  4,        1
3,  2017-08-19T12:53:55:150, 180,       1 
4,  2017-08-19T12:53:55:200,  2,        1
5,  2017-08-19T12:53:55:250,  5,        0
6,  2017-08-19T12:53:55:050,  3,        0
7,  2017-08-19T12:53:55:100,  4,        1
8,  2017-08-19T12:53:55:150, 70,        1
9,  2017-08-19T12:53:55:200,  2,        1
10, 2017-08-19T12:53:55:250,  5,        0

I have been using the above functions but I did not get the right results. Please, any suggestion would be very appreciated. Many Thanks, Best Regards, Carlo

Upvotes: 1

Views: 55

Answers (1)

jezrael
jezrael

Reputation: 862511

You need top eventTime by nlargest, then add Timedelta and get max and minimal timestamp. Last create mask by between, which is cast to integers 0 and 1:

df['eventTime'] = pd.to_datetime(df['eventTime'], format='%Y-%m-%dT%H:%M:%S:%f')

a = df.nlargest(2, 'velocity')['eventTime']
low = min(a) - pd.Timedelta(50, unit='ms')
high = max(a) + pd.Timedelta(50, unit='ms')

#low   2017-08-19 12:53:55.100
#high   2017-08-19 12:53:55.200

df['LabelId'] = df['eventTime'].between(low,high).astype(int)
print (df)

Returns

                 eventTime  velocity  LabelId
1  2017-08-19 12:53:55.050         3        0
2  2017-08-19 12:53:55.100         4        1
3  2017-08-19 12:53:55.150       180        1
4  2017-08-19 12:53:55.200         2        1
5  2017-08-19 12:53:55.250         5        0
6  2017-08-19 12:53:55.050         3        0
7  2017-08-19 12:53:55.100         4        1
8  2017-08-19 12:53:55.150        70        1
9  2017-08-19 12:53:55.200         2        1
10 2017-08-19 12:53:55.250         5        0

Upvotes: 1

Related Questions