Reputation: 761
I have a dataFrame that looks like the following:
page_id content name
1 {} John
1 {cat, dog} Anne
2 {} Ethan
3 {} John
3 {sea, earth} Anne
3 {earth, green} Ethan
4 {} Mark
I need the value of the content
column of each page_id
to be equal to the value of the content
column of the next page_id
, only for the same page_ids. I suppose I need to use the shift()
function al along with a group by page_id
, but I don't know how to put it together.
The expected output would be:
page_id content name
1 {cat, dog} John
1 NaN Anne
2 NaN Ethan
3 {sea, earth} John
3 {earth, green} Anne
3 NaN Ethan
4 NaN Mark
Any help on this issue will be very appreciated.
Upvotes: 1
Views: 99
Reputation: 59549
You can avoid the groupby
apply given your sorting on 'page_id'
. shift
everything then only set the values within group using where
. This will be much faster as the number of groups becomes large.
df['content'] = df.content.shift(-1).where(df.page_id.eq(df.page_id.shift(-1)))
page_id content name
0 1 {cat, dog} John
1 1 NaN Anne
2 2 NaN Ethan
3 3 {earth, sea} John
4 3 {earth, green} Anne
5 3 NaN Ethan
6 4 NaN Mark
Upvotes: 1
Reputation: 88236
Looks like you want a groupby
with shift
:
df['content'] = df.groupby('page_id').content.apply(lambda x: x.shift(-1))
page_id content
0 1.0 {cat, dog}
1 NaN NaN
2 NaN NaN
3 3.0 {earth, sea}
4 3.0 {green, earth}
5 NaN NaN
6 NaN NaN
Upvotes: 2