Reputation: 197
I have a dataframe like below:
timestamp type value
0 1629453592426 0 10
1 1629453594429 0 120000
2 1629453654582 0 110000
3 1629453658662 1 22
4 1629453661342 1 44
5 1629453686644 0 115000
6 1629453689849 0 50
I want the value
values to be 0 when the timestamp
difference with the previous row is less than 60000 if the value
is greater than 100000 or less than -100000 provided that type
didn't change.
So the expected output is:
timestamp type value
0 1629453592426 0 10
1 1629453594429 0 0
2 1629453654582 0 110000
3 1629453658662 1 22
4 1629453661342 1 44
5 1629453686644 0 115000
6 1629453689849 0 50
I tried to do it by creating a timestamp_jump
column with:
df["timestamp_jump"] = df["timestamp"].diff().gt(60000).cumsum()
And then grouping the dataframe by timestamp_jump
and type
columns:
df.groupby(["timestamp_jump", "type"])[["value"]]
But couldn't figure out how to do the other steps. How can I achieve my expected dataframe?
Upvotes: 0
Views: 81
Reputation: 23217
You can set group of consecutive same type
by .cumsum()
on current row type
not equal to previous row type
.
Then use .mask
on the 2 conditions:
value
> 100000 or < -100000timestamp
difference < 60000 (within the same group of consecutive same type
)to change value
to 0
, as follows:
# set group of consecutive same `type`
g = df['type'].ne(df['type'].shift()).cumsum()
df['value'] = (df['value'].mask(
df['value'].abs().gt(100000) &
df.groupby(g)["timestamp"].diff().lt(60000)
, 0)
)
Result:
print(df)
timestamp type value
0 1629453592426 0 10
1 1629453594429 0 0
2 1629453654582 0 110000
3 1629453658662 1 22
4 1629453661342 1 44
5 1629453686644 0 115000
6 1629453689849 0 50
Upvotes: 1
Reputation: 120429
Another way:
df.loc[df['type'].eq(df['type'].shift())
& df['timestamp'].diff().lt(60000)
& df['value'].abs().gt(100000), 'value'] = 0
Output:
>>> df
timestamp type value
0 1629453592426 0 10
1 1629453594429 0 0
2 1629453654582 0 110000
3 1629453658662 1 22
4 1629453661342 1 44
5 1629453686644 0 115000
6 1629453689849 0 50
Upvotes: 2
Reputation: 195
Try iterating over all the rows in the df using iterrows()
and then applying the condintions on current row and previous row, as below
df = pd.DataFrame()
df['timestamp'] = [1629453592426, 1629453594429, 1629453654582, 1629453658662, 1629453661342, 1629453686644, 1629453689849]
df['type'] = [0,0,0,1,1,0,0]
df['value'] = [10,120000,110000,22,44,115000,50]
df
timestamp type value
0 1629453592426 0 10
1 1629453594429 0 120000
2 1629453654582 0 110000
3 1629453658662 1 22
4 1629453661342 1 44
5 1629453686644 0 115000
6 1629453689849 0 50
for index,row in df.iterrows():
if index!=0:
current_timestamp = row['timestamp']
prev_timestamp = list(df.iloc[[index-1]]['timestamp'])[0]
current_val = row['value']
current_type = row['type']
prev_type = list(df.iloc[[index-1]]['type'])[0]
if (abs(current_timestamp - prev_timestamp)<60000) and current_val>100000 and (current_type==prev_type):
df.at[index,'value']=0
df
timestamp type value
0 1629453592426 0 10
1 1629453594429 0 0
2 1629453654582 0 110000
3 1629453658662 1 22
4 1629453661342 1 44
5 1629453686644 0 115000
6 1629453689849 0 50
Upvotes: 0
Reputation: 81
Try this:
for i in range(1, df.shape[0]):
if (df.timestamp[i] - df.timestamp[i-1]) < 60000:
df.value[i] = 0
if abs(df.timestamp[i] - df.timestamp[i-1]) > 100000 & (df.type[i] == df.type[i-1]):
df.value[i] = 0
Upvotes: 0