Jatin Mehrotra
Jatin Mehrotra

Reputation: 11523

REDSHIFT :Copy only new records and ignore the existing ones

I am using an external table that reads JSON files from s3. I have transferred data from external table to internal table using this query

insert into jatinanalysis (title,url,author,published_date,category)
select b.title,b.link,b.author,b.published_date,category
FROM  jatinspectrum.extable a, a.enteries b,b.category category

but every time i run this query it is creating duplicate, I want only new queries, existing ones should be ignored.

update : tried this https://stackoverflow.com/a/656027/13126651 but no luck

insert into jatinanalysis (title,url,author,published_date,category)
select distinct b.title,b.link,b.author,b.published_date,category
FROM  jatinspectrum.extable a, a.enteries b,b.category category
WHERE NOT EXISTS(SELECT * 
                 FROM jatinanalysis
                 WHERE (jatinspectrum.extable.b.title=jatinanalysis.title and
                        jatinspectrum.extable.b.link=jatinanalysis.url and
                        jatinspectrum.extable.b.author=jatinanalysis.author and
                        jatinspectrum.extable.b.b.published_date=jatinanalysis.published_date and
                        jatinspectrum.extable.category=jatinanalysis.category)

Upvotes: 0

Views: 501

Answers (1)

Jatin Mehrotra
Jatin Mehrotra

Reputation: 11523

This worked for me

insert into jatinanalysis (title,url,author,published_date,category)
select b.title,b.link,b.author,b.published_date,category
FROM  jatinspectrum.extable a, a.enteries b,b.category category
EXCEPT
SELECT title,url,author,published_date,category FROM jatinanalysis;

Upvotes: 1

Related Questions