Reputation: 1140
Is there a way to check if all the column values in the current row is less than the corresponding column values in previous row in a pandas dataframe (throughout entire dataframe) and create a new column with a value of 1 or 0 accoringly?
Upvotes: 4
Views: 5546
Reputation: 294488
Consider the dataframe df
np.random.seed([3,1415])
df = pd.DataFrame(np.random.rand(10, 4), columns=list('ABCD'))
A B C D
0 0.444939 0.407554 0.460148 0.465239
1 0.462691 0.016545 0.850445 0.817744
2 0.777962 0.757983 0.934829 0.831104
3 0.879891 0.926879 0.721535 0.117642
4 0.145906 0.199844 0.437564 0.100702
5 0.278735 0.609862 0.085823 0.836997
6 0.739635 0.866059 0.691271 0.377185
7 0.225146 0.435280 0.700900 0.700946
8 0.796487 0.018688 0.700566 0.900749
9 0.764869 0.253200 0.548054 0.778883
Option 1
shift
df.assign(New=(df < df.shift()).all(1).astype(int))
A B C D New
0 0.444939 0.407554 0.460148 0.465239 0
1 0.462691 0.016545 0.850445 0.817744 0
2 0.777962 0.757983 0.934829 0.831104 0
3 0.879891 0.926879 0.721535 0.117642 0
4 0.145906 0.199844 0.437564 0.100702 1
5 0.278735 0.609862 0.085823 0.836997 0
6 0.739635 0.866059 0.691271 0.377185 0
7 0.225146 0.435280 0.700900 0.700946 0
8 0.796487 0.018688 0.700566 0.900749 0
9 0.764869 0.253200 0.548054 0.778883 0
Option 2
numpy
Same concept as Option 1
v = df.values
df.assign(New=np.append(False, (v[1:] < v[:-1]).all(1).astype(int)))
A B C D New
0 0.444939 0.407554 0.460148 0.465239 0
1 0.462691 0.016545 0.850445 0.817744 0
2 0.777962 0.757983 0.934829 0.831104 0
3 0.879891 0.926879 0.721535 0.117642 0
4 0.145906 0.199844 0.437564 0.100702 1
5 0.278735 0.609862 0.085823 0.836997 0
6 0.739635 0.866059 0.691271 0.377185 0
7 0.225146 0.435280 0.700900 0.700946 0
8 0.796487 0.018688 0.700566 0.900749 0
9 0.764869 0.253200 0.548054 0.778883 0
Option 3
diff
Use diff
to compare one row with the next and see if it is less than zero. Then use all
to determine if its True
for the entire row.
df.assign(New=df.diff().lt(0).all(1).astype(int))
A B C D New
0 0.444939 0.407554 0.460148 0.465239 0
1 0.462691 0.016545 0.850445 0.817744 0
2 0.777962 0.757983 0.934829 0.831104 0
3 0.879891 0.926879 0.721535 0.117642 0
4 0.145906 0.199844 0.437564 0.100702 1
5 0.278735 0.609862 0.085823 0.836997 0
6 0.739635 0.866059 0.691271 0.377185 0
7 0.225146 0.435280 0.700900 0.700946 0
8 0.796487 0.018688 0.700566 0.900749 0
9 0.764869 0.253200 0.548054 0.778883 0
Timing
%timeit df.assign(New=df.diff().lt(0).all(1).astype(int))
%timeit df.assign(New=(df < df.shift()).all(1).astype(int))
1000 loops, best of 3: 579 µs per loop
1000 loops, best of 3: 1.56 ms per loop
%%timeit
v = df.values
df.assign(New=np.append(False, (v[1:] < v[:-1]).all(1).astype(int)))
1000 loops, best of 3: 322 µs per loop
How diff
Works
check if all the column values in the current row is less than the corresponding column values in previous row
This is what prompted me to use diff
. pandas.DataFrame.diff
by default calculates the diff array for every column. Meaning, for each row, we have the difference of that row relative to the previous row. For the condition the OP presented to be True
, we need this difference to be less than zero.
df.diff()
A B C D
0 NaN NaN NaN NaN
1 0.017752 -0.391009 0.390297 0.352505
2 0.315271 0.741438 0.084384 0.013360
3 0.101929 0.168895 -0.213294 -0.713463
4 -0.733985 -0.727035 -0.283971 -0.016940
5 0.132829 0.410018 -0.351741 0.736296
6 0.460900 0.256197 0.605448 -0.459812
7 -0.514489 -0.430779 0.009629 0.323761
8 0.571340 -0.416592 -0.000334 0.199803
9 -0.031618 0.234512 -0.152512 -0.121866
Then
df.diff() < 0
A B C D
0 False False False False
1 False True False False
2 False False False False
3 False False True True
4 True True True True
5 False False True False
6 False False False True
7 True True False False
8 False True True False
9 True False True True
Then
(df.diff() < 0).all(1)
0 False
1 False
2 False
3 False
4 True
5 False
6 False
7 False
8 False
9 False
dtype: bool
Upvotes: 7