Reputation: 3890
I have a data set which goes like this:-
2017-03-01 31.8 28.0 32.6
2017-04-01 31.6 28.0 32.6
2017-05-01 31.0 27.0 32.6
2017-06-01 31.0 27.0 32.4
2017-07-01 31.0 27.0 31.4
2017-08-01 30.0 27.0 32.6
Apart from first column, rest of column are temperature. What I would like to do is to compare the values of 4th column (last from right) with other column values to find if temperature values are not more or less than two deg (of 4th column). For example, I would like to count how many times all three columns (row wise) has the value between 30.6 to 34.6.
Is there a function available under pandas to do that?
Upvotes: 0
Views: 93
Reputation: 76917
Values of columns a
b
that is less than more than 2 of c
In [726]: (df[['a', 'b']].sub(df['c'], axis=0).abs() < 2).all(1).sum()
Out[726]: 0
In [727]: (df[['a', 'b']].sub(df['c'], axis=0).abs() < 2)
Out[727]:
a b
0 True False
1 True False
2 True False
3 True False
4 True False
5 False False
Values between 30.6 to 34.6
In [671]: (df[['a', 'b', 'c']] > 30.6) & (df[['a', 'b', 'c']] < 34.6)
Out[671]:
a b c
0 True False True
1 True False True
2 True False True
3 True False True
4 True False True
5 False False True
Values between 30.6 to 34.6, for all columns in a row to be True
In [672]: ((df[['a', 'b', 'c']] > 30.6) & (df[['a', 'b', 'c']] < 34.6)).all(1)
Out[672]:
0 False
1 False
2 False
3 False
4 False
5 False
dtype: bool
Count of rows where values are between 30.6 to 34.6 for all columns
In [673]: ((df[['a', 'b', 'c']] > 30.6) & (df[['a', 'b', 'c']] < 34.6)).all(1).sum()
Out[673]: 0
Upvotes: 2
Reputation: 57033
If I understand your question correctly, you want to know how many times all of the values are withing the range ±2 of the fourth column (the columns are numbered 0,1,2,3):
(((df[3] - df[1]).abs() < 2) & ((df[3] - df[2]).abs() < 2)).sum()
#0
Upvotes: 2
Reputation: 323226
Base on you sample data , you can try :
df2[['t1','t2','t3']].apply(lambda x : abs(x-df2['t3'])<2).sum(axis=1)==3
Out[425]:
0 False
1 False
2 False
3 False
4 False
5 False
dtype: bool
Data Input
df2
Out[426]:
Time t1 t2 t3
0 2017-03-01 31.8 28 32.6
1 2017-04-01 31.6 28 32.6
2 2017-05-01 31.0 27 32.6
3 2017-06-01 31.0 27 32.4
4 2017-07-01 31.0 27 31.4
5 2017-08-01 30.0 27 32.6
Upvotes: 2
Reputation: 1050
You could use dataframe.apply
to perform the comparison for each row, and sum the results of the comparison.
import pandas
df = pandas.DataFrame(
data=[
[31.8, 28.0, 32.6],
[31.6, 28.0, 32.6],
[31.0, 27.0, 32.6],
[31.0, 27.0, 32.4],
[31.0, 27.0, 31.4],
[30.0, 27.0, 32.6]
],
index=['2017-03-01', '2017-04-01', '2017-05-01', '2017-06-01', '2017-07-01', '2017-08-01']
)
df['count'] = df.apply(lambda x: sum((x > 30.6) & (x < 34.6)), axis=1)
print(df)
Since True == 1
and False == 0
, you'll get a count of the columns that meet your criteria
Upvotes: 1