Reputation: 35
It's necessary to emphasize that it's not about removing rows.
In a simple case I have a file from a sensor:
import pandas as pd
df = pd.DataFrame({'Date': ['15/03/2019 10:00:11.000', '15/03/2019 10:00:12.000' , '15/03/2019 10:00:13.000'],
'Pressure' : [-0.162, -0.162, 1.456],
'Conductivity': [-0.001, -0.001, 7.45],
'Water_Temperature': [7.555, 7.555, 8.22],
'Water_Salinity': [0.004, 0.004, 7.63]})
I need to remove values in rows where 'Pressure' <1 , the best without removing 'Date' (an information about amount of missing values and incorrect data is also important).
I tried to do it with .where method:
condition = df['Pressure'] < 1
droped_df = df.where(condition ," " )
but it did nothing. Also I don't know how to extend the condition to rest of columns, especially if in the original file there are more.
The resulting effect should looks like this:
df = pd.DataFrame({'Date': ['15/03/2019 10:00:11.000', '15/03/2019 10:00:12.000' , '15/03/2019 10:00:13.000'],
'Pressure' : [ , , 1.456],
'Conductivity': [ , , 7.45],
'Water_Temperature': [ , , 8.22],
'Water_Salinity': [ , , 7.63]})
Thanks for looking!
Upvotes: 2
Views: 60
Reputation: 863481
Use DataFrame.mask
for set all columns without first selected by DataFrame.iloc
by condition:
df.iloc[:, 1:] = df.iloc[:, 1:].mask(df['Pressure'] < 1)
print (df)
Date Pressure Conductivity Water_Temperature \
0 15/03/2019 10:00:11.000 NaN NaN NaN
1 15/03/2019 10:00:12.000 NaN NaN NaN
2 15/03/2019 10:00:13.000 1.456 7.45 8.22
Water_Salinity
0 NaN
1 NaN
2 7.63
If really need empty spaces - get mixed values numeric with strings, so all numeric operations failed:
df.iloc[:, 1:] = df.iloc[:, 1:].mask(df['Pressure'] < 1, '')
print (df)
Date Pressure Conductivity Water_Temperature \
0 15/03/2019 10:00:11.000
1 15/03/2019 10:00:12.000
2 15/03/2019 10:00:13.000 1.456 7.45 8.22
Water_Salinity
0
1
2 7.63
Upvotes: 3