waitingkuo
waitingkuo

Reputation: 93794

In Pandas, How to use fillna to fill the whole columns with string if the column is empty originally?

My table:

In [15]: csv=u"""a,a,,a
   ....: b,b,,b
   ....: c,c,,c
   ....: """

In [18]: df = pd.read_csv(io.StringIO(csv), header=None)

Fill the empty columns as 'UNKNOWN'

In [19]: df
Out[19]: 
   0  1   2  3
0  a  a NaN  a
1  b  b NaN  b
2  c  c NaN  c

In [20]: df.fillna({2:'UNKNOWN'})

Got the error

ValueError: could not convert string to float: UNKNOWN

Upvotes: 10

Views: 28844

Answers (3)

Behrooz Ostadaghaee
Behrooz Ostadaghaee

Reputation: 91

df = pd.DataFrame({0:['a','b','c'], 1:['a','b','c'], 2:np.nan, 3:['a','b','c']})
df

    0   1   2   3
0   a   a   NaN a
1   b   b   NaN b
2   c   c   NaN c

you could do this by specifying the name of the column inside square brackets and using fillna:

df[2].fillna('UNKNOWN', inplace=True)

If you print df, it will be like this:

    0   1         2  3
0   a   a   UNKNOWN  a
1   b   b   UNKNOWN  b
2   c   c   UNKNOWN  c

you could fill all empty cells in all the columns by:

df.fillna('UNKNOWN', inplace=True)

Upvotes: 0

DSM
DSM

Reputation: 353059

Your 2 column probably has a float dtype:

>>> df
   0  1   2  3
0  a  a NaN  a
1  b  b NaN  b
2  c  c NaN  c
>>> df.dtypes
0     object
1     object
2    float64
3     object
dtype: object

Hence the problem. If you don't mind converting the whole frame to object, you could:

>>> df.astype(object).fillna("UNKNOWN")
   0  1        2  3
0  a  a  UNKNOWN  a
1  b  b  UNKNOWN  b
2  c  c  UNKNOWN  c

Depending on whether there's non-string data you might want to be more selective about converting column dtypes, and/or specify the dtypes on read, but the above should work, anyhow.


Update: if you have dtype information you want to preserve, rather than switching it back, I'd go the other way and only fill on the columns that you wanted to, either using a loop with fillna:

>>> df
   0  1  2   3  4   5
0  0  a  a NaN  a NaN
1  1  b  b NaN  b NaN
2  2  c  c NaN  c NaN
>>> df.dtypes
0      int64
1     object
2     object
3    float64
4     object
5    float64
dtype: object
>>> for col in df.columns[pd.isnull(df).all()]:
...         df[col] = df[col].astype(object).fillna("UNKNOWN")
...     
>>> df
   0  1  2        3  4        5
0  0  a  a  UNKNOWN  a  UNKNOWN
1  1  b  b  UNKNOWN  b  UNKNOWN
2  2  c  c  UNKNOWN  c  UNKNOWN
>>> df.dtypes
0     int64
1    object
2    object
3    object
4    object
5    object
dtype: object

Or (if you're using all), then maybe not even use fillna at all:

>>> df
   0  1  2   3  4   5
0  0  a  a NaN  a NaN
1  1  b  b NaN  b NaN
2  2  c  c NaN  c NaN
>>> df.ix[:,pd.isnull(df).all()] = "UNKNOWN"
>>> df
   0  1  2        3  4        5
0  0  a  a  UNKNOWN  a  UNKNOWN
1  1  b  b  UNKNOWN  b  UNKNOWN
2  2  c  c  UNKNOWN  c  UNKNOWN

Upvotes: 11

Jeff
Jeff

Reputation: 128958

As a workaround, just set the column directly, the fillna upconversion should work and is a bug

In [8]: df = pd.read_csv(io.StringIO(csv), header=None)

In [9]: df
Out[9]: 
   0  1   2  3
0  a  a NaN  a
1  b  b NaN  b
2  c  c NaN  c

In [10]: df.loc[:,2] = 'foo'

In [11]: df
Out[11]: 
   0  1    2  3
0  a  a  foo  a
1  b  b  foo  b
2  c  c  foo  c

In [12]: df.dtypes
Out[12]: 
0    object
1    object
2    object
3    object
dtype: object

Upvotes: 4

Related Questions