Mat17
Mat17

Reputation: 67

How to replace a specific value in an entire CSV file with pandas?

I have a large csv file that I import to pandas. The first row is the header of columns (even if it is values). My goal is to replace all values of -999.999 to 0.

I have already tried to make my first row as header with df.colums = df.iloc[0] but without success. To replace the values I tried to use df.replace but it does not seem to find the -999.99 values.

import pandas as pd

df = pd.read_csv('Test.csv') # importer le dossier CSV 

df.columns = df.iloc[0]
df.replace(-999.999, 0, inplace=True)

df.to_csv('test2.csv')

The CSV file looks like this where the first column (0-1-2...) was automatically created and not part of the original data.

  -120;-119.7;-119.4;-119.1;-118.8;-118.5;-118.2;-117.9
0  -999.999;-999.999;-999.999;20.1;1.2;89.5;-999....   
1  -999.999;-999.999;4.6;25.6;8.9;-999.999;-999.9...   
2  -999.999;-999.999;8.66;12.6;7.6;-999.999;13.98...   
3  -999.999;-999.999;9.12;6.98;0.6;45.3;12.4;-999...   
4  -999.999;-999.999;13.36;4.56;3.9;2.36;-999.999...   
5  -999.999;-999.999;-999.999;21;45.2;-999.999;-9...   

Upvotes: 0

Views: 3904

Answers (2)

lcieslak
lcieslak

Reputation: 91

pd.read_csv() uses a default separator ',' - please pass the argument sep=';' for your data. To eliminate the index in the output just pass the argument index=False to df.to_csv().

Upvotes: 0

It_is_Chris
It_is_Chris

Reputation: 14103

First problem is that you need to specify the separator the second problem with replace is that it will be converted to float64 so using replace(-999.999, 0) will not work.

from io import StringIO

s = '''-120;-119.7;-119.4;-119.1;-118.8;-118.5;-118.2;-117.9
-999.999;-999.999;-999.999;20.1;1.2;89.5;-999
-999.999;-999.999;4.6;25.6;8.9;-999.999;-999.9
-999.999;-999.999;8.66;12.6;7.6;-999.999;13.98
-999.999;-999.999;9.12;6.98;0.6;45.3;12.4;-999
-999.999;-999.999;13.36;4.56;3.9;2.36;-999.999
-999.999;-999.999;-999.999;21;45.2;-999.999;-9'''

df = pd.read_csv(StringIO(s), sep=';')

df = df.replace(df.iloc[0,0], 0) # specify a cell with the value you want to replace

or you can use a float64 number:

df.replace(-999.9989999999999, 0)

   -120  -119.7  -119.4  -119.1  -118.8  -118.5  -118.2  -117.9
0   0.0     0.0    0.00   20.10     1.2   89.50 -999.00     NaN
1   0.0     0.0    4.60   25.60     8.9    0.00 -999.90     NaN
2   0.0     0.0    8.66   12.60     7.6    0.00   13.98     NaN
3   0.0     0.0    9.12    6.98     0.6   45.30   12.40  -999.0
4   0.0     0.0   13.36    4.56     3.9    2.36    0.00     NaN
5   0.0     0.0    0.00   21.00    45.2    0.00   -9.00     NaN

For an actual file StringIO will be replaced with the file path.

The first column, as you call it, is the index if you do not want it in the output just df.to_csv('file.csv', index=False)

Upvotes: 2

Related Questions