Reputation:
I have following data:
data = {'id':[1, 1, 1, 2, 2, 2, 2, 3, 3],
'login': ['2013-05', '2013-05', '2013-06', '2013-06', '2013-07', '2013-07', '2013-08', '2013-06', '2013-07'],
'rn': [1, 2, 3, 1, 2, 3, 4, 1, 2]
}
And I would like to add column 'category' with values, where rn = 1. Was trying this:
data['category'] = data[data['rn'] == 1]['login']
But it assigns category only for row with rn = 1. Result:
+----+---------+----+----------+
| id | login | rn | category |
+----+---------+----+----------+
| 1 | 2013-05 | 1 | 2013-05 |
| 1 | 2013-05 | 2 | NaN |
| 1 | 2013-06 | 3 | NaN |
| 2 | 2013-06 | 1 | 2013-06 |
| 2 | 2013-06 | 2 | NaN |
| 2 | 2013-07 | 3 | NaN |
| 3 | 2013-06 | 1 | 2013-06 |
| 3 | 2013-07 | 2 | NaN |
+----+---------+----+----------+
I would like to get the following result:
+----+---------+----+----------+
| id | login | rn | category |
+----+---------+----+----------+
| 1 | 2013-05 | 1 | 2013-05 |
| 1 | 2013-05 | 2 | 2013-05 |
| 1 | 2013-06 | 3 | 2013-05 |
| 2 | 2013-06 | 1 | 2013-06 |
| 2 | 2013-06 | 2 | 2013-06 |
| 2 | 2013-07 | 3 | 2013-06 |
| 3 | 2013-06 | 1 | 2013-06 |
| 3 | 2013-07 | 2 | 2013-06 |
+----+---------+----+----------+
Upvotes: 2
Views: 58
Reputation: 4315
Use ffill()
Ex.
import pandas as pd
data = {'id':[1, 1, 1, 2, 2, 2, 2, 3, 3],
'login': ['2013-05', '2013-05', '2013-06', '2013-06', '2013-07', '2013-07', '2013-08', '2013-06', '2013-07'],
'rn': [1, 2, 3, 1, 2, 3, 4, 1, 2]
}
df = pd.DataFrame(data)
df['category'] = df[df['rn'] == 1]['login']
df['category'].ffill(inplace=True)
print(df)
O/P:
id login rn category
0 1 2013-05 1 2013-05
1 1 2013-05 2 2013-05
2 1 2013-06 3 2013-05
3 2 2013-06 1 2013-06
4 2 2013-07 2 2013-06
5 2 2013-07 3 2013-06
6 2 2013-08 4 2013-06
7 3 2013-06 1 2013-06
8 3 2013-07 2 2013-06
Upvotes: 1
Reputation: 863166
First filter by condition, create Series
with DataFrame.set_index
and last use Series.map
:
s = data[data['rn'] == 1].set_index('id')['login']
#if possible multiple rn == 1 per groups
#s = data[data['rn'] == 1].drop_duplicates('id').set_index('id')['login']
data['category'] = data['id'].map(s)
print (data)
id login rn category
0 1 2013-05 1 2013-05
1 1 2013-05 2 2013-05
2 1 2013-06 3 2013-05
3 2 2013-06 1 2013-06
4 2 2013-07 2 2013-06
5 2 2013-07 3 2013-06
6 2 2013-08 4 2013-06
7 3 2013-06 1 2013-06
8 3 2013-07 2 2013-06
If rn == 1
exist and is always first per each groups is possible create mising values by Series.where
and then forward filling missing values:
data['category'] = data['login'].where(data['rn'] == 1).ffill()
print (data)
id login rn category
0 1 2013-05 1 2013-05
1 1 2013-05 2 2013-05
2 1 2013-06 3 2013-05
3 2 2013-06 1 2013-06
4 2 2013-07 2 2013-06
5 2 2013-07 3 2013-06
6 2 2013-08 4 2013-06
7 3 2013-06 1 2013-06
8 3 2013-07 2 2013-06
Upvotes: 0