Reputation: 1287
In a python script using the library pandas
, I have a dataset of let's say 100 lines with a feature "X", containing 36 NaN
values, and a list of size 36.
I want to replace all the 36 missing values of the column "X" by the 36 values I have in my list.
It's likely to be a dumb question, but I went through all the doc and couldn't find a way to do it.
Here's an example :
INPUT
Data: X Y
1 8
2 3
NaN 2
NaN 7
1 2
NaN 2
Filler
List: [8, 6, 3]
OUTPUT
Data: X Y
1 8
2 3
8 2
6 7
1 2
3 2
Upvotes: 7
Views: 9363
Reputation: 5213
Start with your dataframe df
print(df)
X Y
0 1.0 8
1 2.0 3
2 NaN 2
3 NaN 7
4 1.0 2
5 NaN 2
Define the values you want to fill with (Note: there must be the same number of elements in your filler
list as NaN
values in your dataframe)
filler = [8, 6, 3]
Filter your column (that contains the NaN
values) and overwrite the selected rows with your filler
df.X[df.X.isnull()] = filler
df.loc[df.X.isnull(), 'X'] = filler
which gives:
print(df)
X Y
0 1.0 8
1 2.0 3
2 8.0 2
3 6.0 7
4 1.0 2
5 3.0 2
Upvotes: 11
Reputation: 9711
This may not be the efficient one, but still works :) First find all index for the Nan's and replace them in loop. Assuming that list is always bigger than number of Nan's
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [np.nan, 1, 2], 'B': [10, np.nan, np.nan], 'C': [[20, 21, 22], [23, 24, 25], np.nan]})
lst=[12,35,78]
index = df['B'].index[df['B'].apply(np.isnan)] #find Index
cnt=0
for item in index:
df.set_value(item, 'B', lst[item]) #replace Nan of the nth index with value from Nth value from list
cnt=cnt+1
print df
A B C
0 NaN 10.0 [20, 21, 22]
1 1.0 NaN [23, 24, 25]
2 2.0 NaN NaN
Output .
A B C
0 NaN 10.0 [20, 21, 22]
1 1.0 35.0 [23, 24, 25]
2 2.0 78.0 NaN
Upvotes: 1
Reputation: 10409
You'd have to use an iterator as an index marker for replacing your NaN's with the value in your custom list:
import numpy as np
import pandas as pd
your_df = pd.DataFrame({'your_column': [0,1,2,np.nan,4,6,np.nan,np.nan,7,8,np.nan,9]}) # a df with 4 NaN's
print your_df
your_custom_list = [1,3,6,8] # custom list with 4 fillers
your_column_vals = your_df['your_column'].values
i_custom = 0 # starting index on your iterator for your custom list
for i in range(len(your_column_vals)):
if np.isnan(your_column_vals[i]):
your_column_vals[i] = your_custom_list[i_custom]
i_custom += 1 # increase the index
your_df['your_column'] = your_column_vals
print your_df
Output:
your_column
0 0.0
1 1.0
2 2.0
3 NaN
4 4.0
5 6.0
6 NaN
7 NaN
8 7.0
9 8.0
10 NaN
11 9.0
your_column
0 0.0
1 1.0
2 2.0
3 1.0
4 4.0
5 6.0
6 3.0
7 6.0
8 7.0
9 8.0
10 8.0
11 9.0
Upvotes: 1