madflow
madflow

Reputation: 8520

Pandas Missing Data

I come from a SPSS background and I want to declare missing values in a Pandas DataFrame.

Consider the following dataset from a Likert Scale:

SELECT COUNT(*),v_6 FROM datatable GROUP BY v_6;

| COUNT(*) | v_6  |
+----------+------+
|     1268 | NULL |
|        2 |  -77 |
|     3186 |    1 |
|     2700 |    2 |
|      512 |    3 |
|       71 |    4 |
|       17 |    5 |
|       14 |    6 |

I have a DataFrame

pdf = psql.frame_query('SELECT * FROM datatable', con)

The null values are already declared as NaN - now I want -77 also to be a missing value.

In SPSS I am used to:

MISSING VALUES v_6 (-77).

No I am looking for the Pandas counterpart

I have read:

http://pandas.pydata.org/pandas-docs/stable/missing_data.html

but I honestly do not get the trick how the proposed way in my case would be...

Upvotes: 3

Views: 979

Answers (1)

roman
roman

Reputation: 117571

Use pandas.Series.replace():

df['v_6'] = df['v_6'].replace(-77, np.NaN)

Upvotes: 3

Related Questions