Mark Kewton
Mark Kewton

Reputation: 121

Real time data handling in postgresql from pandas

I have 2 tables in PostgreSQL:-

  1. "student" table
student_id name   score
1         Adam     10
2         Brian    9
  1. "student_log" table:-
log_id student_id score
1         1       10
2         2       9

I have a python script which fetches a DataFrame with columns - "name" and "score" and then populates it to the student table.

I want to update the student and student_log table whenever the "score" changes for a student. Also, if there is a new student name in the dataframe, I want to add another row for it in the student table as well as maintain its record in the "student_log" table. Can anyone suggest how it can be done?

Let us consider the new fetched DataFrame looks like this:-

name score
Adam  7
Lee   5

Then the Expected Result is:-

"student" table

student_id name   score
1         Adam     7
2         Brian    9
3         Lee      5

"student_log" table:-

log_id student_id score
1         1       10
2         2       9
3         1       7    
4         3       5

Upvotes: 1

Views: 249

Answers (2)

Mark Kewton
Mark Kewton

Reputation: 121

I finally found a good answer. I used trigger, function and CTE.

I create a function to log changes along with a trigger to handle the updates. Following is the code.

CREATE OR REPLACE FUNCTION log_last_changes()
  RETURNS TRIGGER 
  LANGUAGE PLPGSQL
  AS
$$
DECLARE
serial_num integer;
BEGIN
    IF NEW.name <> OLD.name OR NEW.score <> OLD.score
    THEN
         SELECT SETVAL('log_id_seq', (select max(id) from log)) into serial_num;
     INSERT INTO log(student_id,score)
     VALUES(NEW.id,NEW.score)
         ON CONFLICT DO NOTHING;
    END IF;
  RETURN NEW;
END;
$$;

CREATE TRIGGER log_student
  AFTER UPDATE
  ON student
  FOR EACH ROW
  EXECUTE PROCEDURE log_last_changes();

THE CTE expression is as follow:-

WITH new_values(id, name, score) AS ( 
    values
        (1,'Adam',7),
        (2,'Brian',9),
        (3,'Lee',5)      
),
upsert AS 
    ( 
        UPDATE student s 
        SET NAME = nv.name,
        SCORE = nv.score
        FROM new_values nv, student s2
        WHERE 
            s.id = nv.id and s.id = s2.id 
        Returning s.* 
    )
    INSERT INTO student select id, name, score 
    FROM 
        new_values 
    WHERE NOT EXISTS (
            SELECT 1  from upsert up where up.id=new_values.id
    );

Upvotes: 1

nay
nay

Reputation: 1775

I guess you try to diff two dataframe
here is a example

#old student dataframe
old_pd:pd.DataFrame

#new student dataframe
new_pd:pd.DataFrame

joined_pd = new_pd.join(old_pd,on='name',lsuffix='_new',rsuffix='_old')
diff_pd = joined_pd[joined_pd['score_new']!=joined_pd['score_old']]

#then insert all diff_pd to student_log table.and update to student table

Upvotes: 0

Related Questions