Reputation: 737
I have data in following format:
8A564 nan json
8A928 nan json
8A563 nan json
8A564 10616280 json
8A563 10616222 json
8A564 nan json
8B1BB 10982483 json
8A564 10616280 json
I would like to fill data in second column to match row that has same first column and non null value in second. So I would get following:
8A564 10616280 json
8A928 nan json
8A563 10616222 json
8A564 10616280 json
8A563 10616222 json
8A564 10616280 json
8B1BB 10982483 json
8A564 10616280 json
How can it be achieved?
Upvotes: 7
Views: 105
Reputation: 2032
here's a python one-liner to help you.
df.iloc[:, [0,1]].groupby(0).apply(lambda x: x[1].fillna( np.nan if len(x[1].dropna()) == 0 else x[1].dropna().mode()[0])).reset_index()[[0,1]]
It uses group-by and also fills only if certain condition is met.
Upvotes: 0
Reputation: 294228
groupby
and bfill
Keep in mind the the 0
in groupby(0)
refers to the column named 0
. If your column has a different name, use that.
df.groupby(0).bfill()
0 1 2
0 8A564 10616280 json
1 8A928 NaN json
2 8A563 10616222 json
3 8A564 10616280 json
4 8A563 10616222 json
5 8A564 10616280 json
6 8B1BB 10982483 json
7 8A564 10616280 json
If the ordering of what is null doesn't lend itself to back filling, you can get the first non-null value.
df[1] = df.groupby(0)[1].transform('first')
df
0 1 2
0 8A564 10616280 json
1 8A928 NaN json
2 8A563 10616222 json
3 8A564 10616280 json
4 8A563 10616222 json
5 8A564 10616280 json
6 8B1BB 10982483 json
7 8A564 10616280 json
Upvotes: 5