annonymous_guy
annonymous_guy

Reputation: 93

Pandas replace NaN values with a list for specific columns

I have a dataframe with two rows

df = pd.DataFrame({'group' : ['c'] * 2,
                   'num_column': range(2),
                   'num_col_2': range(2),
                   'seq_col': [[1,2,3,4,5]] * 2,
                   'seq_col_2': [[1,2,3,4,5]] * 2,
                   'grp_count': [2]*2})

With 8 nulls, it looks like this:

df = df.append(pd.DataFrame({'group': group}, index=[0] * size))

  group  grp_count  num_col_2  num_column          seq_col        seq_col_2
0     c        2.0        0.0         0.0  [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5]
1     c        2.0        1.0         1.0  [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5]
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN
0     c        NaN        NaN         NaN              NaN              NaN

What I want

Replace NaN values in sequences columns (seq_col, seq_col_2, seq_col_3 etc) with a list of my own.

Note: .

I could not find solutions that replaces NaN with a user provided list value from a dictionary suppose.

Pseudo Code:

for each key, value in dict,
   for each column in df
       if column matches key in dict
         # here matches means the 'seq_col_n' key of dict matched the df 
         # column named 'seq_col_n'
         replace NaN with value in seq_col_n (which is a list of numbers)

I tried this code below, it works for the first column you pass then for the second column it doesn't. Which is weird.

 df.loc[df['seq_col'].isnull(),['seq_col']] = df.loc[df['seq_col'].isnull(),'seq_col'].apply(lambda m: fill_values['seq_col'])

The above works but then try again on seq_col_2, it will give weird results.

Expected Output: Given param input:

my_dict = {seq_col: [1,2,3], seq_col_2: [6,7,8]}

# after executing the code from pseudo code given, it should look like
 group  grp_count  num_col_2  num_column          seq_col        seq_col_2
0     c        2.0        0.0         0.0  [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5]
1     c        2.0        1.0         1.0  [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]
0     c        NaN        NaN         NaN          [1,2,3]          [6,7,8]

Upvotes: 4

Views: 4565

Answers (1)

jpp
jpp

Reputation: 164693

With input arrays, you can use pd.DataFrame.loc with pd.Series.isnull:

import pandas as pd, numpy as np

df = pd.DataFrame({'group' : ['c'] * 2,
                   'num_column': range(2),
                   'num_col_2': range(2),
                   'seq_col': [[1,2,3,4,5]] * 2,
                   'seq_col_2': [[1,2,3,4,5]] * 2,
                   'grp_count': [2]*2})

df = df.append(pd.DataFrame({'group': ['c']*8}, index=[0] * 8))

L1 = np.array([0, 1, 2, 3, 4, 5, 6, 7])
L2 = np.array([10, 11, 12, 13, 14, 15, 16, 17])

df.loc[df['seq_col'].isnull(), 'seq_col'] = L1
df.loc[df['seq_col_2'].isnull(), 'seq_col_2'] = L2

print(df[['seq_col', 'seq_col_2']])

           seq_col        seq_col_2
0  [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5]
1  [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5]
0                0               10
0                1               11
0                2               12
0                3               13
0                4               14
0                5               15
0                6               16
0                7               17

If you need list values in your series, then you can convert to a series explicitly before assignment:

df.loc[df['seq_col'].isnull(), 'seq_col'] = pd.Series([[1, 2, 3]]*len(df))

Upvotes: 3

Related Questions