Reputation: 399
I have the following code written in pandas
pd.concat([
df[['id', 'day','name', 'message', 'answer','status']],
df.groupby(['day', 'id'])[['name']].shift(1).add_prefix('next'),
df.groupby(['day', 'id'])[['message']].shift(1).add_prefix('next'),
df.groupby(['day', 'id'])[['status']].shift(1).add_prefix('next'),
], axis=1)
Have no idea which functions can replace add_prefix and shift(1)
So if the dataframe looks like this
id day name message answer status
0 1 20210101 15:00 Karan Hi Yes Ok
1 1 20210101 15:10 Rohit Wow No Ok
2 2 20210102 15:00 Sahil Bye One Ko
3 2 20210102 15:10 Aryan Ciao Two Ko
The result should look like this
id daytime name message answer status nextname nextmessage nextstatus
0 1 20210101 15:00 Karan Hi Yes Ok Rohit Wow Ok
2 2 20210102 15:00 Sahil Bye One Ko Aryan Ciao Ko
Upvotes: 0
Views: 40
Reputation: 1868
If I understood your code and reproduced correctly here
import pandas as pd
data={'id': [1, 1, 2, 2],
'day': [20210101, 20210101, 20210102, 20210102],
'name': ['Karan', 'Rohit', 'Sahil', 'Aryan'],
'message': ['Hi', 'Wow', 'Bye', 'Ciao'],
'answer': ['Yes', 'No', 'One', 'Two'],
'status': ['Ok', 'Ok', 'Ko', 'Ko']}
df = pd.DataFrame(data)
print(df)
res = pd.concat([
df[['id', 'day','name', 'message', 'answer','status']],
df.groupby(['day', 'id'])[['name']].shift(1).add_prefix('next'),
df.groupby(['day', 'id'])[['message']].shift(1).add_prefix('next'),
df.groupby(['day', 'id'])[['status']].shift(1).add_prefix('next'),
], axis=1)
print(res.head(10))
all it does is for every id
and day
grouping, it takes the previous row and it associates it with the next
columns... e.g. in my previous example, the original dataset is
id day name message answer status
0 1 20210101 Karan Hi Yes Ok
1 1 20210101 Rohit Wow No Ok
2 2 20210102 Sahil Bye One Ko
3 2 20210102 Aryan Ciao Two Ko
and the updated is
id day name message answer status nextname nextmessage nextstatus
0 1 20210101 Karan Hi Yes Ok NaN NaN NaN
1 1 20210101 Rohit Wow No Ok Karan Hi Ok
2 2 20210102 Sahil Bye One Ko NaN NaN NaN
3 2 20210102 Aryan Ciao Two Ko Sahil Bye Ko
If this is the case, then the add_prefix
can be substituted by naming the column properly and the next
by the LAG function
if you create a PG table and insert the same rows with
create table my_test(id int, day int, name varchar, message varchar, answer varchar, status varchar);
insert into my_test values(1, 20210101, 'Karan', 'Hi', 'Yes', 'Ok');
insert into my_test values(1, 20210101, 'Rohit', 'Wow', 'No', 'Ok');
insert into my_test values(2, 20210102, 'Sahil', 'Bye', 'One', 'Ko');
insert into my_test values(2, 20210102, 'Aryan', 'Ciao', 'Two', 'Ko');
Then the query is (assuming you order by ctid
the results, if you have another ordering query substitute it)
select id,
day,
message,
answer,
status,
lag(message,1) over (partition by day, id order by ctid) next_message,
lag(answer,1) over (partition by day, id order by ctid) next_answer,
lag(status,1) over (partition by day, id order by ctid) next_status
from my_test;
With the result being
id | day | message | answer | status | next_message | next_answer | next_status
----+----------+---------+--------+--------+--------------+-------------+-------------
1 | 20210101 | Hi | Yes | Ok | | |
1 | 20210101 | Wow | No | Ok | Hi | Yes | Ok
2 | 20210102 | Bye | One | Ko | | |
2 | 20210102 | Ciao | Two | Ko | Bye | One | Ko
(4 rows)
Upvotes: 1