bhdrozgn
bhdrozgn

Reputation: 197

Change column value based on difference of another column

I have a dataframe like below:

    timestamp        type        value
0   1629453592426    0           10
1   1629453594429    0           120000
2   1629453654582    0           110000
3   1629453658662    1           22
4   1629453661342    1           44
5   1629453686644    0           115000
6   1629453689849    0           50

I want the value values to be 0 when the timestamp difference with the previous row is less than 60000 if the value is greater than 100000 or less than -100000 provided that type didn't change.

So the expected output is:

    timestamp        type        value
0   1629453592426    0           10
1   1629453594429    0           0
2   1629453654582    0           110000
3   1629453658662    1           22
4   1629453661342    1           44
5   1629453686644    0           115000
6   1629453689849    0           50

I tried to do it by creating a timestamp_jump column with:

df["timestamp_jump"] = df["timestamp"].diff().gt(60000).cumsum()

And then grouping the dataframe by timestamp_jump and type columns:

df.groupby(["timestamp_jump", "type"])[["value"]]

But couldn't figure out how to do the other steps. How can I achieve my expected dataframe?

Upvotes: 0

Views: 81

Answers (4)

SeaBean
SeaBean

Reputation: 23217

You can set group of consecutive same type by .cumsum() on current row type not equal to previous row type.

Then use .mask on the 2 conditions:

  1. value > 100000 or < -100000
  2. timestamp difference < 60000 (within the same group of consecutive same type)

to change value to 0, as follows:

# set group of consecutive same `type`
g = df['type'].ne(df['type'].shift()).cumsum()

df['value'] = (df['value'].mask(
                                df['value'].abs().gt(100000) &
                                df.groupby(g)["timestamp"].diff().lt(60000)
                                , 0)
              )

Result:

print(df)

       timestamp  type   value
0  1629453592426     0      10
1  1629453594429     0       0
2  1629453654582     0  110000
3  1629453658662     1      22
4  1629453661342     1      44
5  1629453686644     0  115000
6  1629453689849     0      50

Upvotes: 1

Corralien
Corralien

Reputation: 120429

Another way:

df.loc[df['type'].eq(df['type'].shift())
       & df['timestamp'].diff().lt(60000)
       & df['value'].abs().gt(100000), 'value'] = 0

Output:

>>> df
       timestamp  type   value
0  1629453592426     0      10
1  1629453594429     0       0
2  1629453654582     0  110000
3  1629453658662     1      22
4  1629453661342     1      44
5  1629453686644     0  115000
6  1629453689849     0      50

Upvotes: 2

Yadnesh Salvi
Yadnesh Salvi

Reputation: 195

Try iterating over all the rows in the df using iterrows() and then applying the condintions on current row and previous row, as below

df = pd.DataFrame()
df['timestamp'] = [1629453592426, 1629453594429, 1629453654582, 1629453658662, 1629453661342, 1629453686644, 1629453689849]
df['type'] = [0,0,0,1,1,0,0]
df['value'] = [10,120000,110000,22,44,115000,50]

df

timestamp   type    value
0   1629453592426   0   10
1   1629453594429   0   120000
2   1629453654582   0   110000
3   1629453658662   1   22
4   1629453661342   1   44
5   1629453686644   0   115000
6   1629453689849   0   50


for index,row in df.iterrows():
    if index!=0:
        current_timestamp = row['timestamp']
        prev_timestamp = list(df.iloc[[index-1]]['timestamp'])[0]
        current_val = row['value']
        current_type = row['type']
        prev_type = list(df.iloc[[index-1]]['type'])[0]
        if (abs(current_timestamp - prev_timestamp)<60000) and current_val>100000 and (current_type==prev_type):
            df.at[index,'value']=0

df

timestamp   type    value
0   1629453592426   0   10
1   1629453594429   0   0
2   1629453654582   0   110000
3   1629453658662   1   22
4   1629453661342   1   44
5   1629453686644   0   115000
6   1629453689849   0   50

Upvotes: 0

Ho&#224;i L&#226;m
Ho&#224;i L&#226;m

Reputation: 81

Try this:

for i in range(1, df.shape[0]):
    if (df.timestamp[i] - df.timestamp[i-1]) < 60000:
        df.value[i] = 0
    if abs(df.timestamp[i] - df.timestamp[i-1]) > 100000 & (df.type[i] == df.type[i-1]):
        df.value[i] = 0

Upvotes: 0

Related Questions