user11918954
user11918954

Reputation:

Assign value to user from another row value

I have following data:

data = {'id':[1, 1, 1, 2, 2, 2, 2, 3, 3], 
        'login': ['2013-05', '2013-05', '2013-06', '2013-06', '2013-07', '2013-07', '2013-08', '2013-06', '2013-07'],
'rn': [1, 2, 3, 1, 2, 3, 4, 1, 2]
} 

And I would like to add column 'category' with values, where rn = 1. Was trying this:

data['category'] = data[data['rn'] == 1]['login']

But it assigns category only for row with rn = 1. Result:

+----+---------+----+----------+
| id |  login  | rn | category |
+----+---------+----+----------+
|  1 | 2013-05 |  1 | 2013-05  |
|  1 | 2013-05 |  2 | NaN      |
|  1 | 2013-06 |  3 | NaN      |
|  2 | 2013-06 |  1 | 2013-06  |
|  2 | 2013-06 |  2 | NaN      |
|  2 | 2013-07 |  3 | NaN      |
|  3 | 2013-06 |  1 | 2013-06  |
|  3 | 2013-07 |  2 | NaN      |
+----+---------+----+----------+

I would like to get the following result:

+----+---------+----+----------+
| id |  login  | rn | category |
+----+---------+----+----------+
|  1 | 2013-05 |  1 | 2013-05  |
|  1 | 2013-05 |  2 | 2013-05  |
|  1 | 2013-06 |  3 | 2013-05  |
|  2 | 2013-06 |  1 | 2013-06  |
|  2 | 2013-06 |  2 | 2013-06  |
|  2 | 2013-07 |  3 | 2013-06  |
|  3 | 2013-06 |  1 | 2013-06  |
|  3 | 2013-07 |  2 | 2013-06  |
+----+---------+----+----------+

Upvotes: 2

Views: 58

Answers (2)

bharatk
bharatk

Reputation: 4315

Use ffill()

Ex.

import pandas as pd

data = {'id':[1, 1, 1, 2, 2, 2, 2, 3, 3],
        'login': ['2013-05', '2013-05', '2013-06', '2013-06', '2013-07', '2013-07', '2013-08', '2013-06', '2013-07'],
'rn': [1, 2, 3, 1, 2, 3, 4, 1, 2]
}
df = pd.DataFrame(data)
df['category'] = df[df['rn'] == 1]['login']
df['category'].ffill(inplace=True)
print(df)

O/P:

  id    login  rn category
0   1  2013-05   1  2013-05
1   1  2013-05   2  2013-05
2   1  2013-06   3  2013-05
3   2  2013-06   1  2013-06
4   2  2013-07   2  2013-06
5   2  2013-07   3  2013-06
6   2  2013-08   4  2013-06
7   3  2013-06   1  2013-06
8   3  2013-07   2  2013-06

Upvotes: 1

jezrael
jezrael

Reputation: 863166

First filter by condition, create Series with DataFrame.set_index and last use Series.map:

s = data[data['rn'] == 1].set_index('id')['login']
#if possible multiple rn == 1 per groups
#s = data[data['rn'] == 1].drop_duplicates('id').set_index('id')['login']
data['category'] = data['id'].map(s)
print (data)
   id    login  rn category
0   1  2013-05   1  2013-05
1   1  2013-05   2  2013-05
2   1  2013-06   3  2013-05
3   2  2013-06   1  2013-06
4   2  2013-07   2  2013-06
5   2  2013-07   3  2013-06
6   2  2013-08   4  2013-06
7   3  2013-06   1  2013-06
8   3  2013-07   2  2013-06

If rn == 1 exist and is always first per each groups is possible create mising values by Series.where and then forward filling missing values:

data['category'] = data['login'].where(data['rn'] == 1).ffill()
print (data)
   id    login  rn category
0   1  2013-05   1  2013-05
1   1  2013-05   2  2013-05
2   1  2013-06   3  2013-05
3   2  2013-06   1  2013-06
4   2  2013-07   2  2013-06
5   2  2013-07   3  2013-06
6   2  2013-08   4  2013-06
7   3  2013-06   1  2013-06
8   3  2013-07   2  2013-06

Upvotes: 0

Related Questions