dingaro
dingaro

Reputation: 2342

How to fill NaN values in 3 columns based on group of values in 2 other columns in Data Frame in Python Pandas?

I have Data Frame in Python Pandas like below:

data = [
    (1, None, None, None, '2023-01-10'),
    (1, None, None, None, '2023-01-10'),
    (1, 9, 0, 0.55, '2023-01-10'),
    (2, None, None, None, '2023-11-22'),
    (2, 88, 1, 0.68, '2023-11-22')
]

df = pd.DataFrame(data, columns=['id', 'col1', 'col2', 'col3', 'col_date'])
df

enter image description here

And I need to for each group of values from columns: id, col_date (data type datetime) fill values in columns: col1, col2, col3. For each group of values from columns: id, col_date at least one row has values in: col1, col2, col3 and I need to fill rest of rows by this values for each mentioned group.

So, as a result i need to have something like below:

data = [
    (1, 9, 0, 0.55, '2023-01-10'),
    (1, 9, 0, 0.55, '2023-01-10'),
    (1, 9, 0, 0.55, '2023-01-10'),
    (2, 88, 1, 0.68, '2023-11-22'),
    (2, 88, 1, 0.68, '2023-11-22')
]

df = pd.DataFrame(data, columns=['id', 'col1', 'col2', 'col3', 'col_date'])
df

enter image description here

How can I do that in Python Pandas ?

Upvotes: 0

Views: 84

Answers (3)

Corralien
Corralien

Reputation: 120559

You can use groupby_transform to broadcast the max value to all rows of the group (Nan will always be smaller than any value). As index is preserved, you can use fillna to fill missing values:

>>> df.fillna(df.groupby(['id', 'col_date']).transform('max'))

   id  col1  col2  col3   col_date
0   1   9.0   0.0  0.55 2023-01-10
1   1   9.0   0.0  0.55 2023-01-10
2   1   9.0   0.0  0.55 2023-01-10
3   2  88.0   1.0  0.68 2023-11-22
4   2  88.0   1.0  0.68 2023-11-22

Upvotes: 1

Kirill Kondratenko
Kirill Kondratenko

Reputation: 377

Try it:

df = df[['id', 'col_date']].merge(
        df.dropna(),
        on=['id', 'col_date'], how='left')

Output:

    col_date    id  col1    col2    col3
0   2023-01-10  1   9.0     0.0    0.55
1   2023-01-10  1   9.0     0.0    0.55
2   2023-01-10  1   9.0     0.0    0.55
3   2023-11-22  2   88.0    1.0    0.68
4   2023-11-22  2   88.0    1.0    0.68

Upvotes: -1

Ömer Sezer
Ömer Sezer

Reputation: 2669

Group by 'id' and 'col_date', and fill missing values with the first non-null value in each group with ffill, bfill:

Code:

df[['col1', 'col2', 'col3']] = df.groupby(['id', 'col_date'])[['col1', 'col2', 'col3']].transform(lambda x: x.fillna(method='ffill').fillna(method='bfill'))
df.reset_index(drop=True, inplace=True)

Output:

   id  col1  col2  col3    col_date
0   1   9.0   0.0  0.55  2023-01-10
1   1   9.0   0.0  0.55  2023-01-10
2   1   9.0   0.0  0.55  2023-01-10
3   2  88.0   1.0  0.68  2023-11-22
4   2  88.0   1.0  0.68  2023-11-22

Upvotes: 0

Related Questions