Reputation: 67
I have a large csv file that I import to pandas. The first row is the header of columns (even if it is values). My goal is to replace all values of -999.999 to 0.
I have already tried to make my first row as header with df.colums = df.iloc[0] but without success. To replace the values I tried to use df.replace but it does not seem to find the -999.99 values.
import pandas as pd
df = pd.read_csv('Test.csv') # importer le dossier CSV
df.columns = df.iloc[0]
df.replace(-999.999, 0, inplace=True)
df.to_csv('test2.csv')
The CSV file looks like this where the first column (0-1-2...) was automatically created and not part of the original data.
-120;-119.7;-119.4;-119.1;-118.8;-118.5;-118.2;-117.9
0 -999.999;-999.999;-999.999;20.1;1.2;89.5;-999....
1 -999.999;-999.999;4.6;25.6;8.9;-999.999;-999.9...
2 -999.999;-999.999;8.66;12.6;7.6;-999.999;13.98...
3 -999.999;-999.999;9.12;6.98;0.6;45.3;12.4;-999...
4 -999.999;-999.999;13.36;4.56;3.9;2.36;-999.999...
5 -999.999;-999.999;-999.999;21;45.2;-999.999;-9...
Upvotes: 0
Views: 3904
Reputation: 91
pd.read_csv()
uses a default separator ','
- please pass the argument sep=';'
for your data. To eliminate the index in the output just pass the argument index=False
to df.to_csv()
.
Upvotes: 0
Reputation: 14103
First problem is that you need to specify the separator the second problem with replace
is that it will be converted to float64
so using replace(-999.999, 0)
will not work.
from io import StringIO
s = '''-120;-119.7;-119.4;-119.1;-118.8;-118.5;-118.2;-117.9
-999.999;-999.999;-999.999;20.1;1.2;89.5;-999
-999.999;-999.999;4.6;25.6;8.9;-999.999;-999.9
-999.999;-999.999;8.66;12.6;7.6;-999.999;13.98
-999.999;-999.999;9.12;6.98;0.6;45.3;12.4;-999
-999.999;-999.999;13.36;4.56;3.9;2.36;-999.999
-999.999;-999.999;-999.999;21;45.2;-999.999;-9'''
df = pd.read_csv(StringIO(s), sep=';')
df = df.replace(df.iloc[0,0], 0) # specify a cell with the value you want to replace
or you can use a float64
number:
df.replace(-999.9989999999999, 0)
-120 -119.7 -119.4 -119.1 -118.8 -118.5 -118.2 -117.9
0 0.0 0.0 0.00 20.10 1.2 89.50 -999.00 NaN
1 0.0 0.0 4.60 25.60 8.9 0.00 -999.90 NaN
2 0.0 0.0 8.66 12.60 7.6 0.00 13.98 NaN
3 0.0 0.0 9.12 6.98 0.6 45.30 12.40 -999.0
4 0.0 0.0 13.36 4.56 3.9 2.36 0.00 NaN
5 0.0 0.0 0.00 21.00 45.2 0.00 -9.00 NaN
For an actual file StringIO
will be replaced with the file path.
The first column, as you call it, is the index
if you do not want it in the output just df.to_csv('file.csv', index=False)
Upvotes: 2