Reputation: 6848
This is my dataframe:
date ids
0 2011-04-23 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...
1 2011-04-24 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...
2 2011-04-25 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...
3 2011-04-26 Nan
4 2011-04-27 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...
5 2011-04-28 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...
I want to replace Nan
with []
. How to do that? .fillna([])
did not work. I even tried replace(np.nan, [])
but it gives error:
TypeError('Invalid "to_replace" type: \'float\'',)
Upvotes: 93
Views: 83911
Reputation: 456
Maybe not the most short/optimized solution, but I think is pretty readable:
# Masking-in nans
mask = df['ids'].isna()
# Filling nans with a list-like string and literally-evaluating such string
df.loc[mask, 'ids'] = df.loc[mask, 'ids'].fillna('[]').apply(eval)
EDIT
Following the advice from Swier's comment:
# Packages
import ast
# Masking-in nans
mask = df['ids'].isna()
# Filling nans with a list-like string and literally-evaluating such string
df.loc[mask, 'ids'] = df.loc[mask, 'ids'].fillna('[]').apply(ast.literal_eval)
Upvotes: 4
Reputation: 13
I have solved a more complex case and want to share the solution here.
In each cell of the DataFrame there is a nested List with 100x sublists [a, b] inside. Some values for the columns bids_aggr3 and asks_aggr3 are np.nan. The number of NA columns is unknown. The example above shows only 2.
The goal is to use the pd.explode() to expand each row to 100x rows that containing one sublist of the original cell for each column respectively. This requires however that the values in each cell (nested lists) have same length. So i need to fill the NA's with a nested List like [[na, na], [na, na], ...[na, na]] with the length 100.
After some research i came to a generic solution which can replace NA's without specifying the columns.
nan_cell = [[np.nan, np.nan]]*100
rows = df.loc[df.isna().any(axis=1)].index
columns = df.columns[df.isna().any(axis=0)]
df.loc[rows, columns] = pd.Series([nan_cell]*len(rows))
In the line 1 the new nested List is generated. In the line 2 and 3 the rows and columns with NA's are located. In the line 4 use loc() to set the nested List as value for each NA cell. The loc() requires a Scala or Serie with the same length as input.
The result looks like this:
A quick check of the value in the first row:
HOWEVER! If the NA's are not occurring in sequence or not from the the top or the bottom of the column, rather somewhere mid in the dataframe and in different rows for each column, the solution above will not work.
In this case you can use the [index, column] pairs to directly access each cell to modify the values.
Example:
df = pd.DataFrame({'a': [0, 1, 2, np.nan, np.nan, 5, 6], 'b': [22, 23, 2, 1, 0, np.nan, 99]})
mask=df.isna().stack()
cells = mask.loc[mask].index.tolist()
cells
# idx, idy = np.where(pd.isnull(df))
# cells = np.column_stack([df.index[idx], df.columns[idy]])
# cells
for i in result:
df.at[i[0], i[1]] = pd.Series([[np.nan, np,nan]])
The For loop could be performance bottleneck if the DataFrame is large. If someone knows a more pythonic/vectorized way, please share the solution.
Hopefully someone will find this helpful. Cheers!
Upvotes: 0
Reputation: 19037
Another solution that is explicit:
# use apply to only replace the nulls with the list
df.loc[df.ids.isnull(), 'ids'] = df.loc[df.ids.isnull(), 'ids'].apply(lambda x: [])
Upvotes: 2
Reputation: 21
You can try this:
df.fillna(df.notna().applymap(lambda x: x or []))
Upvotes: 2
Reputation: 1583
A simple solution would be:
df['ids'].fillna("").apply(list)
As noted by @timgeb, this requires df['ids']
to contain lists or nan only.
Upvotes: 52
Reputation: 78700
Surprisingly, passing a dict with empty lists as values seems to work for Series.fillna
, but not DataFrame.fillna
- so if you want to work on a single column you can use this:
>>> df
A B C
0 0.0 2.0 NaN
1 NaN NaN 5.0
2 NaN 7.0 NaN
>>> df['C'].fillna({i: [] for i in df.index})
0 []
1 5
2 []
Name: C, dtype: object
The solution can be extended to DataFrames by applying it to every column.
>>> df.apply(lambda s: s.fillna({i: [] for i in df.index}))
A B C
0 0 2 []
1 [] [] 5
2 [] 7 []
Note: for large Series/DataFrames with few missing values, this might create an unreasonable amount of throwaway empty lists.
Tested with pandas
1.0.5.
Upvotes: 13
Reputation: 500
This is probably faster, one liner solution:
df['ids'].fillna('DELETE').apply(lambda x : [] if x=='DELETE' else x)
Upvotes: 1
Reputation: 19947
Another solution using numpy:
df.ids = np.where(df.ids.isnull(), pd.Series([[]]*len(df)), df.ids)
Or using combine_first:
df.ids = df.ids.combine_first(pd.Series([[]]*len(df)))
Upvotes: 4
Reputation: 4543
Maybe more dense:
df['ids'] = [[] if type(x) != list else x for x in df['ids']]
Upvotes: 1
Reputation: 17
Create a function that checks your condition, if not, it returns an empty list/empty set etc.
Then apply that function to the variable, but also assigning the new calculated variable to the old one or to a new variable if you wish.
aa=pd.DataFrame({'d':[1,1,2,3,3,np.NaN],'r':[3,5,5,5,5,'e']})
def check_condition(x):
if x>0:
return x
else:
return list()
aa['d]=aa.d.apply(lambda x:check_condition(x))
Upvotes: 0
Reputation: 1368
My approach is similar to @hellpanderrr's, but instead tests for list-ness rather than using isnan
:
df['ids'] = df['ids'].apply(lambda d: d if isinstance(d, list) else [])
I originally tried using pd.isnull
(or pd.notnull
) but, when given a list, that returns the null-ness of each element.
Upvotes: 75
Reputation: 2274
After a lot of head-scratching I found this method that should be the most efficient (no looping, no apply), just assigning to a slice:
isnull = df.ids.isnull()
df.loc[isnull, 'ids'] = [ [[]] * isnull.sum() ]
The trick was to construct your list of []
of the right size (isnull.sum()
), and then enclose it in a list: the value you are assigning is a 2D array (1 column, isnull.sum()
rows) containing empty lists as elements.
Upvotes: 41
Reputation: 5896
Without assignments:
1) Assuming we have only floats and integers in our dataframe
import math
df.apply(lambda x:x.apply(lambda x:[] if math.isnan(x) else x))
2) For any dataframe
import math
def isnan(x):
if isinstance(x, (int, long, float, complex)) and math.isnan(x):
return True
df.apply(lambda x:x.apply(lambda x:[] if isnan(x) else x))
Upvotes: 3
Reputation: 109546
You can first use loc
to locate all rows that have a nan
in the ids
column, and then loop through these rows using at
to set their values to an empty list:
for row in df.loc[df.ids.isnull(), 'ids'].index:
df.at[row, 'ids'] = []
>>> df
date ids
0 2011-04-23 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
1 2011-04-24 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
2 2011-04-25 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
3 2011-04-26 []
4 2011-04-27 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
5 2011-04-28 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
Upvotes: 26