nathan
nathan

Reputation: 1

Why is this query returning two rows?

I am trying to run a rolling tally as time goes on, so the second (duplicate) row is adding the same value again and is mucking up the count. How do I stop this from happening? I only want 1 row per date to then add the next date. I have tried distinct but the second row count still appears so I think I need a different command.

See on the output code at the bottom, the last column is total, the second last is the new amount to add to the total

Select 
    dea.continent, dea.location, dea.date, dea.population, 
    vac.new_vaccinations,
    sum(cast(vac.new_vaccinations as bigint)) over (partition by dea.location order by dea.location, dea.date)
from 
    PortfolioProject..CovidDeaths dea
join 
    PortfolioProject..CovidVaccinations vac on dea.location = vac.location
                                            and dea.date = vac.date
where 
    dea.continent is not null
order by 
    2,3
Europe  Austria 2020-12-28 00:00:00.000 8922082 1345    2690
Europe  Austria 2020-12-28 00:00:00.000 8922082 1345    2690
Europe  Austria 2020-12-29 00:00:00.000 8922082 1694    6078
Europe  Austria 2020-12-29 00:00:00.000 8922082 1694    6078
Europe  Austria 2020-12-30 00:00:00.000 8922082 1429    8936
Europe  Austria 2020-12-30 00:00:00.000 8922082 1429    8936
Europe  Austria 2020-12-31 00:00:00.000 8922082 25      8986
Europe  Austria 2020-12-31 00:00:00.000 8922082 25      8986
Europe  Austria 2021-01-01 00:00:00.000 8922082 19      9024
Europe  Austria 2021-01-01 00:00:00.000 8922082 19      9024

Upvotes: 0

Views: 84

Answers (1)

bekir çelik
bekir çelik

Reputation: 168

Partition by clause don't reduce returned rows. Just add aggregate data to rows. Because of aggregate on CovidVaccinations table, you assumes there are many records for vac.location + vac.date.

So you should first prepare data from CovidVaccinations with;

SELECT vac.location, vac.date, sum( vac.new_vaccinations )
FROM portfolioproject..CovidVaccinations vac
GROUP BY vac.location, vac.date

then join with CovidDeaths without aggregate again, ike that;

WITH vac
( SELECT vac.location, vac.date, sum( vac.new_vaccinations ) as total
    FROM portfolioproject..CovidVaccinations vac
    GROUP BY vac.location, vac.date
)
SELECT vac.total, ... FROM CovidDeaths dea, vac
WHERE dea.location = vac.location
 AND dea.date = vac.date

If CovidDeaths table also has many record per dea.continent, dea.location, dea.date, dea.population, than group its data as well, then join with other result

Upvotes: 1

Related Questions