athelas
athelas

Reputation: 105

Python copy values from one row into other based on other column

I've been trying this the whole day, so I really hope someone can help me.

I have a dataframe that looks like this (sorry for the format, it didn't let me format it as table, because it then remarked there would be an error in my code...):

| ColumnA  | ColumnB |otherColumn    |
| -------- | ------- |-------------- |
| 1        | n.a.    |row x          |
| 1        | n.a.    |row b          |
| 1        | n.a.    |row x          |
| 2        | 23467   |row x          |
| 2        | n.a.    |row y          |
| 3        | n.a.    |row x          |
| 3        | 768345  |row  y         |
| 3        | n.a.    |row  y         |
| 3        | 768345  |row   x        |
| 4        | 95634511|row  x         |
| 4        | n.a.    |row    r       |
| 5        | n.a.    |row    d       |

I now need to fill the same values of ColumnB in those rows that have the same number in ColumnA. (otherColumn is not needed in this case, I've just added it to show that there are multiple other columns too). If there is no value in any of the rows in ColumnB belonging to the same number in ColumnA, it should remain n.a. (as e.g. for "1" and "5") So the desired output should be

| ColumnA  | ColumnB |otherColumn    |
| -------- | --------|-------------- |
| 1        | n.a.    |row x          |
| 1        | n.a.    |row b          |
| 1        | n.a.    |row x          |
| 2        | 23467   |row x          |
| 2        | 23467   |row y          |
| 3        | 768345  |row x          |
| 3        | 768345  |row  y         |
| 3        | 768345  |row  y         |
| 3        | 768345  |row   x        |
| 4        | 95634511|row  x         |
| 4        | 95634511|row    r       |
| 5        | n.a.    |row    d       |

I've tried it with converting the two columns to a dictionary, but was then not able to use the keys and values separately (to build an if-statement); I've tried it with producing a second dataframe consisting of ColumnA and ColumnB and trying to merge it (but then it added more rows); I've tried it with update() and combine_first(), which wasn't successful either.

Every advice is very much appreciated!!

Upvotes: 2

Views: 1429

Answers (2)

Anurag Dabas
Anurag Dabas

Reputation: 24304

Try replace()+groupby()+apply():

replace string 'n.a.' to actual NaN then groupby 'ColumnA' then forward fill 'ColumnB' values finally assign that back to 'ColumnB'

df['ColumnB']=(df.replace('n.a.',float('NaN'))
                 .groupby('ColumnA')['ColumnB']
                 .apply(lambda x:x.ffill().bfill()))

Note: you can also use transform in place of apply()

Upvotes: 2

Henry Yik
Henry Yik

Reputation: 22493

You can find the valid values with groupby and transform first:

print (df.replace('n.a.', np.NaN).groupby("ColumnA")["ColumnB"].transform("first"))

0         None
1         None
2         None
3        23467
4        23467
5       768345
6       768345
7       768345
8       768345
9     95634511
10    95634511
11        None

Upvotes: 3

Related Questions