Jackson Tale
Jackson Tale

Reputation: 25812

How to replace None only with empty string using pandas?

the code below generates a df:

import pandas as pd
from datetime import datetime as dt
import numpy as np

dates = [dt(2014, 1, 2, 2), dt(2014, 1, 2, 3), dt(2014, 1, 2, 4), None]
strings1 = ['A', 'B',None, 'C']
strings2 = [None, 'B','C', 'C']
strings3 = ['A', 'B','C', None]
vals = [1.,2.,np.nan, 4.]
df = pd.DataFrame(dict(zip(['A','B','C','D','E'],
                           [strings1, dates, strings2, strings3, vals])))



+---+------+---------------------+------+------+-----+
|   |  A   |          B          |  C   |  D   |  E  |
+---+------+---------------------+------+------+-----+
| 0 | A    | 2014-01-02 02:00:00 | None | A    | 1   |
| 1 | B    | 2014-01-02 03:00:00 | B    | B    | 2   |
| 2 | None | 2014-01-02 04:00:00 | C    | C    | NaN |
| 3 | C    | NaT                 | C    | None | 4   |
+---+------+---------------------+------+------+-----+

I would like to replace all None (real None in python, not str) inside with ''(empty string).

The expected df is

+---+---+---------------------+---+---+-----+
|   | A |          B          | C | D |  E  |
+---+---+---------------------+---+---+-----+
| 0 | A | 2014-01-02 02:00:00 |   | A | 1   |
| 1 | B | 2014-01-02 03:00:00 | B | B | 2   |
| 2 |   | 2014-01-02 04:00:00 | C | C | NaN |
| 3 | C | NaT                 | C |   | 4   |
+---+---+---------------------+---+---+-----+

what I did is

df = df.replace([None], [''], regex=True)

But I got

+---+---+---------------------+---+------+---+
|   | A |          B          | C |  D   | E |
+---+---+---------------------+---+------+---+
| 0 | A | 1388628000000000000 |   | A    | 1 |
| 1 | B | 1388631600000000000 | B | B    | 2 |
| 2 |   | 1388635200000000000 | C | C    |   |
| 3 | C |                     | C |      | 4 |
+---+---+---------------------+---+------+---+

  1. all the dates becomes big numbers
  2. Even NaT and NaN are replaced, which I don't want.

How can I achieve that correctly and efficently?

Upvotes: 29

Views: 90131

Answers (5)

drorhun
drorhun

Reputation: 584

You can use replace method. It works properly.

df.replace('', np.nan, inplace=True)

Source

Upvotes: 3

GeoStoneMarten
GeoStoneMarten

Reputation: 583

This is sufficient

df.fillna("",inplace=True)
df
Out[142]: 
   A                    B  C  D  E
0  A  2014-01-02 02:00:00     A  1
1  B  2014-01-02 03:00:00  B  B  2
2     2014-01-02 04:00:00  C  C   
3  C                       C     4

edit 2021-07-26 complete response following @dWitty's comment

If you really want to keep Nat and NaN values on other than text, you just need fill Na for your text column In your exemple this is A, C, D

You just send a dict of replacement value for your columns. value can be differents for each column. For your case you just need construct the dict

# default values to replace NA (None)
# values = {"A": "", "C": "", "D": ""}
values = (dict([[e,""] for e in ['A','C','D']]))
df.fillna(value=values, inplace=True)
df
Out[142]: 
   A                   B  C  D    E
0  A 2014-01-02 02:00:00     A  1.0
1  B 2014-01-02 03:00:00  B  B  2.0
2    2014-01-02 04:00:00  C  C  NaN
3  C                 NaT  C     4.0

Upvotes: 39

Gulzar
Gulzar

Reputation: 27896

For those who are trying to replace None, and not just np.nan (which is covered in here)

default_value = ""
df.apply(lambda x: x if x is not None else default_value)

here is a nice one-liner

Upvotes: 2

Ricky McMaster
Ricky McMaster

Reputation: 4647

Since the relevant columns you wish to alter are all objects, you could just specify this with the dtype attribute (for completeness I added in string and unicode) and use fillna.

So:

for c in df:
   if str(df[c].dtype) in ('object', 'string_', 'unicode_'):
        df[c].fillna(value='', inplace=True)

This will leave numeric and date columns unaffected.

To see the data types for all columns:

df.dtypes 

Upvotes: 6

EdChum
EdChum

Reputation: 393863

It looks like None is being promoted to NaN and so you cannot use replace like usual, the following works:

In [126]:
mask = df.applymap(lambda x: x is None)
cols = df.columns[(mask).any()]
for col in df[cols]:
    df.loc[mask[col], col] = ''
df

Out[126]:
   A                   B  C  D   E
0  A 2014-01-02 02:00:00     A   1
1  B 2014-01-02 03:00:00  B  B   2
2    2014-01-02 04:00:00  C  C NaN
3  C                 NaT  C      4

So we generate a mask of the None values using applymap, we then use this mask to iterate over each column of interest and using the boolean mask set the values.

Upvotes: 17

Related Questions