Reputation: 25812

How to replace None only with empty string using pandas?

the code below generates a df:

import pandas as pd
from datetime import datetime as dt
import numpy as np

dates = [dt(2014, 1, 2, 2), dt(2014, 1, 2, 3), dt(2014, 1, 2, 4), None]
strings1 = ['A', 'B',None, 'C']
strings2 = [None, 'B','C', 'C']
strings3 = ['A', 'B','C', None]
vals = [1.,2.,np.nan, 4.]
df = pd.DataFrame(dict(zip(['A','B','C','D','E'],
                           [strings1, dates, strings2, strings3, vals])))



+---+------+---------------------+------+------+-----+
|   |  A   |          B          |  C   |  D   |  E  |
+---+------+---------------------+------+------+-----+
| 0 | A    | 2014-01-02 02:00:00 | None | A    | 1   |
| 1 | B    | 2014-01-02 03:00:00 | B    | B    | 2   |
| 2 | None | 2014-01-02 04:00:00 | C    | C    | NaN |
| 3 | C    | NaT                 | C    | None | 4   |
+---+------+---------------------+------+------+-----+

I would like to replace all None (real None in python, not str) inside with ''(empty string).

The expected df is

+---+---+---------------------+---+---+-----+
|   | A |          B          | C | D |  E  |
+---+---+---------------------+---+---+-----+
| 0 | A | 2014-01-02 02:00:00 |   | A | 1   |
| 1 | B | 2014-01-02 03:00:00 | B | B | 2   |
| 2 |   | 2014-01-02 04:00:00 | C | C | NaN |
| 3 | C | NaT                 | C |   | 4   |
+---+---+---------------------+---+---+-----+

what I did is

df = df.replace([None], [''], regex=True)

But I got

+---+---+---------------------+---+------+---+
|   | A |          B          | C |  D   | E |
+---+---+---------------------+---+------+---+
| 0 | A | 1388628000000000000 |   | A    | 1 |
| 1 | B | 1388631600000000000 | B | B    | 2 |
| 2 |   | 1388635200000000000 | C | C    |   |
| 3 | C |                     | C |      | 4 |
+---+---+---------------------+---+------+---+

all the dates becomes big numbers
Even NaT and NaN are replaced, which I don't want.

How can I achieve that correctly and efficently?

Upvotes: 29

Answers (5)

drorhun

Reputation: 584

You can use replace method. It works properly.

df.replace('', np.nan, inplace=True)

Source

Upvotes: 3

GeoStoneMarten

Reputation: 583

This is sufficient

df.fillna("",inplace=True)

df
Out[142]: 
   A                    B  C  D  E
0  A  2014-01-02 02:00:00     A  1
1  B  2014-01-02 03:00:00  B  B  2
2     2014-01-02 04:00:00  C  C   
3  C                       C     4

edit 2021-07-26 complete response following @dWitty's comment

If you really want to keep Nat and NaN values on other than text, you just need fill Na for your text column In your exemple this is A, C, D

You just send a dict of replacement value for your columns. value can be differents for each column. For your case you just need construct the dict

# default values to replace NA (None)
# values = {"A": "", "C": "", "D": ""}
values = (dict([[e,""] for e in ['A','C','D']]))
df.fillna(value=values, inplace=True)

df
Out[142]: 
   A                   B  C  D    E
0  A 2014-01-02 02:00:00     A  1.0
1  B 2014-01-02 03:00:00  B  B  2.0
2    2014-01-02 04:00:00  C  C  NaN
3  C                 NaT  C     4.0

Upvotes: 39

Gulzar

Reputation: 27896

For those who are trying to replace None, and not just np.nan (which is covered in here)

default_value = ""
df.apply(lambda x: x if x is not None else default_value)

here is a nice one-liner

Upvotes: 2

Ricky McMaster

Reputation: 4647

Since the relevant columns you wish to alter are all objects, you could just specify this with the dtype attribute (for completeness I added in string and unicode) and use fillna.

So:

for c in df:
   if str(df[c].dtype) in ('object', 'string_', 'unicode_'):
        df[c].fillna(value='', inplace=True)

This will leave numeric and date columns unaffected.

To see the data types for all columns:

df.dtypes

Upvotes: 6

EdChum

Reputation: 393863

It looks like None is being promoted to NaN and so you cannot use replace like usual, the following works:

In [126]:
mask = df.applymap(lambda x: x is None)
cols = df.columns[(mask).any()]
for col in df[cols]:
    df.loc[mask[col], col] = ''
df

Out[126]:
   A                   B  C  D   E
0  A 2014-01-02 02:00:00     A   1
1  B 2014-01-02 03:00:00  B  B   2
2    2014-01-02 04:00:00  C  C NaN
3  C                 NaT  C      4

So we generate a mask of the None values using applymap, we then use this mask to iterate over each column of interest and using the boolean mask set the values.

Upvotes: 17

How to replace None only with empty string using pandas?

Answers (5)

Related Questions