Reputation: 105
I've been trying this the whole day, so I really hope someone can help me.
I have a dataframe that looks like this (sorry for the format, it didn't let me format it as table, because it then remarked there would be an error in my code...):
| ColumnA | ColumnB |otherColumn |
| -------- | ------- |-------------- |
| 1 | n.a. |row x |
| 1 | n.a. |row b |
| 1 | n.a. |row x |
| 2 | 23467 |row x |
| 2 | n.a. |row y |
| 3 | n.a. |row x |
| 3 | 768345 |row y |
| 3 | n.a. |row y |
| 3 | 768345 |row x |
| 4 | 95634511|row x |
| 4 | n.a. |row r |
| 5 | n.a. |row d |
I now need to fill the same values of ColumnB in those rows that have the same number in ColumnA. (otherColumn is not needed in this case, I've just added it to show that there are multiple other columns too). If there is no value in any of the rows in ColumnB belonging to the same number in ColumnA, it should remain n.a. (as e.g. for "1" and "5") So the desired output should be
| ColumnA | ColumnB |otherColumn |
| -------- | --------|-------------- |
| 1 | n.a. |row x |
| 1 | n.a. |row b |
| 1 | n.a. |row x |
| 2 | 23467 |row x |
| 2 | 23467 |row y |
| 3 | 768345 |row x |
| 3 | 768345 |row y |
| 3 | 768345 |row y |
| 3 | 768345 |row x |
| 4 | 95634511|row x |
| 4 | 95634511|row r |
| 5 | n.a. |row d |
I've tried it with converting the two columns to a dictionary, but was then not able to use the keys and values separately (to build an if-statement); I've tried it with producing a second dataframe consisting of ColumnA and ColumnB and trying to merge it (but then it added more rows); I've tried it with update() and combine_first(), which wasn't successful either.
Every advice is very much appreciated!!
Upvotes: 2
Views: 1429
Reputation: 24304
Try replace()
+groupby()
+apply()
:
replace string 'n.a.' to actual NaN then groupby 'ColumnA' then forward fill 'ColumnB' values finally assign that back to 'ColumnB'
df['ColumnB']=(df.replace('n.a.',float('NaN'))
.groupby('ColumnA')['ColumnB']
.apply(lambda x:x.ffill().bfill()))
Note: you can also use transform
in place of apply()
Upvotes: 2
Reputation: 22493
You can find the valid values with groupby
and transform first
:
print (df.replace('n.a.', np.NaN).groupby("ColumnA")["ColumnB"].transform("first"))
0 None
1 None
2 None
3 23467
4 23467
5 768345
6 768345
7 768345
8 768345
9 95634511
10 95634511
11 None
Upvotes: 3