dingaro
dingaro

Reputation: 2342

How to fill NaN values in 3 columns based on group of values in 2 other columns and stay untouchd values in rest cols in Data Frame in Python Pandas?

I have Data Frame in Python Pandas like below:

data = [
    (1, None, None, None, '2023-01-10', None, None),
    (1, None, None, None, '2023-01-10', 1, 0),
    (1, 9, 0, 0.55, '2023-01-10', 15, None),
    (2, None, None, None, '2023-11-22', 2, 1),
    (2, 88, 1, 0.68, '2023-11-22', 103, 8)
]

df = pd.DataFrame(data, columns=['id', 'col1', 'col2', 'col3', 'col_date', 'col4', 'col5'])
df

enter image description here

And I need to for each group of values from columns: id, col_date (data type datetime) fill values in columns: col1, col2, col3. For each group of values from columns: id, col_date at least one row has values in: col1, col2, col3 and I need to fill rest of rows by this values for each mentioned group.

Values in columns: col4, col5 (and rest of many more columns whoch was not included in this example) have stay untouched.

So as a result I need something like below:

data = [
    (1, 9, 0, 0.55, '2023-01-10', None, None),
    (1, 9, 0, 0.55, '2023-01-10', 1, 0),
    (1, 9, 0, 0.55, '2023-01-10', 15, None),
    (2, 88, 1, 0.68, '2023-11-22', 2, 1),
    (2, 88, 1, 0.68, '2023-11-22', 103, 8)
]

df = pd.DataFrame(data, columns=['id', 'col1', 'col2', 'col3', 'col_date', 'col4', 'col5'])
df

enter image description here

How can I do that in Python Pandas ?

Upvotes: 0

Views: 60

Answers (2)

Pawan Tolani
Pawan Tolani

Reputation: 295

See if you can try this. Basically, the below code snippet merges dataframe on to itself (after keeping in non nan rows). Note that df2 will have only those rows where all the three columns (col1,col2 and col3) have non null values.

df2=df[['id','col_date','col1','col2','col3']].copy(deep=True)
df2 = df2[df[['col1','col2','col3']].notnull().all(1)]
final=df.merge(df2,on=['id','col_date'],how='inner')
final=final[['id','col1_y','col2_y','col3_y','col_date','col4','col5']].rename(columns={'col1_y':'col1','col2_y':'col2','col3_y':'col3'})

Upvotes: 0

Corralien
Corralien

Reputation: 120559

You can still use my previous answer but select desired columns between groupby and transform:

>>> df.fillna(df.groupby(['id', 'col_date'])[['col1', 'col2', 'col3']]
                .transform('max'))
   id  col1  col2  col3    col_date   col4  col5
0   1   9.0   0.0  0.55  2023-01-10    NaN   NaN
1   1   9.0   0.0  0.55  2023-01-10    1.0   0.0
2   1   9.0   0.0  0.55  2023-01-10   15.0   NaN
3   2  88.0   1.0  0.68  2023-11-22    2.0   1.0
4   2  88.0   1.0  0.68  2023-11-22  103.0   8.0

max only works if your dtype is numeric. If it's not the case, you can use a custom function. Replace 'max' with lambda x: x.dropna().squeeze() or lambda x: x.ffill().bfill():

>>> df.dtypes
id            int64
col1         object
col2        float64
col3         object
col_date     object
col4        float64
col5        float64
dtype: object

>>> df.fillna(df.groupby(['id', 'col_date'])[['col1', 'col2', 'col3']]
                .transform(lambda x: x.ffill().bfill()))
   id  col1  col2  col3    col_date   col4  col5
0   1   9.0   0.0  0.55  2023-01-10    NaN   NaN
1   1   9.0   0.0  0.55  2023-01-10    1.0   0.0
2   1   9.0   0.0  0.55  2023-01-10   15.0   NaN
3   2  88.0   1.0  0.68  2023-11-22    2.0   1.0
4   2  88.0   1.0  0.68  2023-11-22  103.0   8.0

Upvotes: 0

Related Questions