Reputation: 2342
I have Data Frame in Python Pandas like below:
data = [
(1, None, None, None, '2023-01-10'),
(1, None, None, None, '2023-01-10'),
(1, 9, 0, 0.55, '2023-01-10'),
(2, None, None, None, '2023-11-22'),
(2, 88, 1, 0.68, '2023-11-22')
]
df = pd.DataFrame(data, columns=['id', 'col1', 'col2', 'col3', 'col_date'])
df
And I need to for each group of values from columns: id, col_date (data type datetime) fill values in columns: col1, col2, col3. For each group of values from columns: id, col_date at least one row has values in: col1, col2, col3 and I need to fill rest of rows by this values for each mentioned group.
So, as a result i need to have something like below:
data = [
(1, 9, 0, 0.55, '2023-01-10'),
(1, 9, 0, 0.55, '2023-01-10'),
(1, 9, 0, 0.55, '2023-01-10'),
(2, 88, 1, 0.68, '2023-11-22'),
(2, 88, 1, 0.68, '2023-11-22')
]
df = pd.DataFrame(data, columns=['id', 'col1', 'col2', 'col3', 'col_date'])
df
How can I do that in Python Pandas ?
Upvotes: 0
Views: 84
Reputation: 120559
You can use groupby_transform
to broadcast the max
value to all rows of the group (Nan
will always be smaller than any value). As index is preserved, you can use fillna
to fill missing values:
>>> df.fillna(df.groupby(['id', 'col_date']).transform('max'))
id col1 col2 col3 col_date
0 1 9.0 0.0 0.55 2023-01-10
1 1 9.0 0.0 0.55 2023-01-10
2 1 9.0 0.0 0.55 2023-01-10
3 2 88.0 1.0 0.68 2023-11-22
4 2 88.0 1.0 0.68 2023-11-22
Upvotes: 1
Reputation: 377
Try it:
df = df[['id', 'col_date']].merge(
df.dropna(),
on=['id', 'col_date'], how='left')
Output:
col_date id col1 col2 col3
0 2023-01-10 1 9.0 0.0 0.55
1 2023-01-10 1 9.0 0.0 0.55
2 2023-01-10 1 9.0 0.0 0.55
3 2023-11-22 2 88.0 1.0 0.68
4 2023-11-22 2 88.0 1.0 0.68
Upvotes: -1
Reputation: 2669
Group by 'id' and 'col_date', and fill missing values with the first non-null value in each group with ffill, bfill
:
Code:
df[['col1', 'col2', 'col3']] = df.groupby(['id', 'col_date'])[['col1', 'col2', 'col3']].transform(lambda x: x.fillna(method='ffill').fillna(method='bfill'))
df.reset_index(drop=True, inplace=True)
Output:
id col1 col2 col3 col_date
0 1 9.0 0.0 0.55 2023-01-10
1 1 9.0 0.0 0.55 2023-01-10
2 1 9.0 0.0 0.55 2023-01-10
3 2 88.0 1.0 0.68 2023-11-22
4 2 88.0 1.0 0.68 2023-11-22
Upvotes: 0