Reputation: 399

How to rewrite the following code in pandas to sql?

I have the following code written in pandas

pd.concat([
        df[['id', 'day','name', 'message', 'answer','status']], 
        df.groupby(['day', 'id'])[['name']].shift(1).add_prefix('next'),
        df.groupby(['day', 'id'])[['message']].shift(1).add_prefix('next'),
        df.groupby(['day', 'id'])[['status']].shift(1).add_prefix('next'),
    ], axis=1)

Have no idea which functions can replace add_prefix and shift(1)

So if the dataframe looks like this

   id       day        name message answer status
0   1  20210101 15:00  Karan      Hi    Yes     Ok
1   1  20210101 15:10  Rohit     Wow     No     Ok
2   2  20210102 15:00  Sahil     Bye    One     Ko
3   2  20210102 15:10  Aryan    Ciao    Two     Ko

The result should look like this

   id    daytime       name message answer status nextname nextmessage nextstatus
0   1  20210101 15:00  Karan      Hi    Yes     Ok      Rohit     Wow     Ok
2   2  20210102 15:00  Sahil     Bye    One     Ko      Aryan    Ciao     Ko

Upvotes: 0

Answers (1)

Ftisiot

Reputation: 1868

If I understood your code and reproduced correctly here

import pandas as pd

data={'id':      [1, 1, 2, 2],
      'day':     [20210101, 20210101, 20210102, 20210102],
      'name':    ['Karan', 'Rohit', 'Sahil', 'Aryan'],
      'message': ['Hi', 'Wow', 'Bye', 'Ciao'],
      'answer':  ['Yes', 'No', 'One', 'Two'],
      'status':  ['Ok', 'Ok', 'Ko', 'Ko']}
df = pd.DataFrame(data)
print(df)

res = pd.concat([
    df[['id', 'day','name', 'message', 'answer','status']],
    df.groupby(['day', 'id'])[['name']].shift(1).add_prefix('next'),
    df.groupby(['day', 'id'])[['message']].shift(1).add_prefix('next'),
    df.groupby(['day', 'id'])[['status']].shift(1).add_prefix('next'),
], axis=1)

print(res.head(10))

all it does is for every id and day grouping, it takes the previous row and it associates it with the next columns... e.g. in my previous example, the original dataset is

   id       day   name message answer status
0   1  20210101  Karan      Hi    Yes     Ok
1   1  20210101  Rohit     Wow     No     Ok
2   2  20210102  Sahil     Bye    One     Ko
3   2  20210102  Aryan    Ciao    Two     Ko

and the updated is

   id       day   name message answer status nextname nextmessage nextstatus
0   1  20210101  Karan      Hi    Yes     Ok      NaN         NaN        NaN
1   1  20210101  Rohit     Wow     No     Ok    Karan          Hi         Ok
2   2  20210102  Sahil     Bye    One     Ko      NaN         NaN        NaN
3   2  20210102  Aryan    Ciao    Two     Ko    Sahil         Bye         Ko

If this is the case, then the add_prefix can be substituted by naming the column properly and the next by the LAG function

if you create a PG table and insert the same rows with

create table my_test(id int, day int, name varchar, message varchar, answer varchar, status varchar);
insert into my_test values(1, 20210101, 'Karan', 'Hi', 'Yes', 'Ok');
insert into my_test values(1, 20210101, 'Rohit', 'Wow', 'No', 'Ok');
insert into my_test values(2, 20210102, 'Sahil', 'Bye', 'One', 'Ko');
insert into my_test values(2, 20210102, 'Aryan', 'Ciao', 'Two', 'Ko');

Then the query is (assuming you order by ctid the results, if you have another ordering query substitute it)

select id, 
    day, 
    message, 
    answer, 
    status,
    lag(message,1) over (partition by day, id order by ctid) next_message,
    lag(answer,1) over (partition by day, id order by ctid) next_answer,
    lag(status,1) over (partition by day, id order by ctid) next_status
from my_test;

With the result being

 id |   day    | message | answer | status | next_message | next_answer | next_status 
----+----------+---------+--------+--------+--------------+-------------+-------------
  1 | 20210101 | Hi      | Yes    | Ok     |              |             | 
  1 | 20210101 | Wow     | No     | Ok     | Hi           | Yes         | Ok
  2 | 20210102 | Bye     | One    | Ko     |              |             | 
  2 | 20210102 | Ciao    | Two    | Ko     | Bye          | One         | Ko
(4 rows)

Upvotes: 1

How to rewrite the following code in pandas to sql?

Answers (1)

Related Questions