Reputation: 373
Count criteria: A "true" count, by month-year, of the number of students (indexed by a unique Student ID) at each School by School Service Type.
I am currently using MS-Access. I need to be able to extract counts for a database with similar data as the sample data below. Each row is a payment observation.
Sample data:
Student ID | School | School Service Type | PaymentStartDate | PaymentEndDate |
001 ABC ED 01/02/2010 02/04/2012
001 ABC ED 01/02/2010 01/05/2010
001 ABC ED 04/02/2010 05/05/2010
001 DEF EZ 01/02/2010 02/04/2012
001 RR 02/02/2012 02/03/2012
002 ABC ED 02/02/2010 02/03/2011
002 ABC EZ 02/02/2010 06/03/2010
002 GHI ED 02/04/2011 02/04/2012
003 ABC ED 02/02/2011 02/03/2012
003 DEF ED 01/02/2010 08/03/2010
003 RR 02/02/2011 02/03/2011
004 RR 02/02/2011 02/03/2011
005 GHI ED 08/02/2010 02/04/2011
006 GHI ED 08/02/2010 08/02/2010
What I want my count to look like:
Month | Year | School | ED | EZ | RR |
01 2010 ABC 1 0 0
01 2010 DEF 1 1 0
01 2010 GHI 0 0 0
02 2010 ABC 2 1 0
03 2010 ABC 2 1 0
Example
Desire: I want a true count of students, across schools, by service type, for January 2010
Biggest hurdle - Duplicates
Please take a look at the observations for Student ID 001
. Between January 2010 and February 2012, School ABC
received 3 payments for the exact same Service Type (ED)
for Student 001.
I want my January 2010 count of students that received ED
services at school ABC
to return a value of 1, since only one student (Student 001
) received services for ED
at that school for that time.
However, when I conduct a crosstab to retrieve my counts with this data as it is, it returns a value of 2 (for two students). The reason being that both payment #1 and payment #2 of Student 001
meet my month-year criteria of January 2010.
Payment #1 meets the criteria because January 2010 lies within the payment date range of 01/02/2010 - 02/04/2012*.
Payment #2 also meets the criteria because January 2010 lies within the payment date range of 01/02/2010 - 01/05/2010.
Payment #3 does not meet the criteria because January 2010 is not within the date range for that row (04/02/2010 - 05/05/2010).
*You get these dates from the PaymentStartDate
and PaymentEndDate
respectively.
I prepared an Excel version of the sample data here:Link to Excel file
Please keep in mind that:
The time elapsed between the PaymentStartDate and PaymentEndDate is extremely variable across the board, and ranges from 0 days to 122 days.
Many times, there are truly unique payment observations where the time lapsed between PaymentStartDate and PaymentEndDate is 0 days (take a look at the payment row for Student ID 006 in the data above). Therefore, getting rid of rows that do not meet a specified "time lapsed between PaymentStartDate and PaymentEndDate" criteria is not an option, since many times they are not the duplicates I am trying to get rid of for my counts.
Yes, for certain Service Types, there is no School value.
As always, any helpful ideas of advice on how I can resolve this duplicates issue, and retrieve my true count values within MS-Access are greatly appreciated. Thank you for your time.
Edit (02/10/2014): Changed the count output above to reflect the sample data I provided in my post. My apologies for not doing so before.
Upvotes: 3
Views: 245
Reputation: 123474
Here's one way to tackle it. For sample data in a table named [Payments]
Payment Row Student ID School School Service Type PaymentStartDate PaymentEndDate
----------- ---------- ------ ------------------- ---------------- --------------
1 001 ABC ED 2010-01-02 2012-02-04
2 001 ABC ED 2010-01-02 2010-01-05
3 001 ABC ED 2010-04-02 2010-05-05
4 001 DEF EZ 2010-01-02 2012-02-04
5 001 RR 2012-02-02 2012-02-03
6 002 ABC ED 2010-02-02 2011-02-03
7 002 ABC EZ 2010-02-02 2010-06-03
8 002 GHI ED 2011-02-04 2012-02-04
9 003 ABC ED 2011-02-02 2012-02-03
10 003 DEF ED 2010-01-02 2010-08-03
11 003 RR 2011-02-02 2011-02-03
12 004 RR 2011-02-02 2011-02-03
13 005 GHI ED 2010-08-02 2011-02-04
14 006 GHI ED 2010-08-02 2010-08-02
if we create a saved query in Access named [PaymentsYearMonth]
SELECT
[Student ID],
School,
[School Service Type],
(Year(PaymentStartDate) * 100) + Month(PaymentStartDate) AS StartYYYYMM,
(Year(PaymentEndDate) * 100) + Month(PaymentEndDate) AS EndYYYYMM
FROM Payments
it will give us
Student ID School School Service Type StartYYYYMM EndYYYYMM
---------- ------ ------------------- ----------- ---------
001 ABC ED 201001 201202
001 ABC ED 201001 201001
001 ABC ED 201004 201005
001 DEF EZ 201001 201202
001 RR 201202 201202
002 ABC ED 201002 201102
002 ABC EZ 201002 201006
002 GHI ED 201102 201202
003 ABC ED 201102 201202
003 DEF ED 201001 201008
003 RR 201102 201102
004 RR 201102 201102
005 GHI ED 201008 201102
006 GHI ED 201008 201008
To generate the Year/Month pairs that the data covers, we can use a table named [MonthNumbers]
MonthNumber
-----------
1
2
3
4
5
6
7
8
9
10
11
12
and a table named [YearNumbers] which covers every year that might possibly appear in the data. For safety one might include every year from, say, 1901 to 2525, but for illustration we'll just use
YearNumber
----------
2009
2010
2011
2012
2013
Now we can create a saved query named [MonthsToReport] to give us the rows that might have results
SELECT
yn.YearNumber,
mn.MonthNumber,
(yn.YearNumber * 100) + mn.MonthNumber AS YYYYMM
FROM
YearNumbers AS yn,
MonthNumbers AS mn
WHERE ((yn.YearNumber * 100) + mn.MonthNumber)>=DMin("StartYYYYMM","PaymentsYearMonth")
AND ((yn.YearNumber * 100) + mn.MonthNumber)<=DMax("EndYYYYMM","PaymentsYearMonth")
Sorted by Year and Month that would look like
YearNumber MonthNumber YYYYMM
---------- ----------- ------
2010 1 201001
2010 2 201002
2010 3 201003
2010 4 201004
2010 5 201005
2010 6 201006
2010 7 201007
2010 8 201008
2010 9 201009
2010 10 201010
2010 11 201011
2010 12 201012
2011 1 201101
2011 2 201102
2011 3 201103
2011 4 201104
2011 5 201105
2011 6 201106
2011 7 201107
2011 8 201108
2011 9 201109
2011 10 201110
2011 11 201111
2011 12 201112
2012 1 201201
2012 2 201202
Now to create a query that gives us distinct instances of Year/Month/Student/School/Type
SELECT DISTINCT
mtr.YearNumber,
mtr.MonthNumber,
pym.[Student ID],
pym.School,
pym.[School Service Type]
FROM
MonthsToReport AS mtr
INNER JOIN
PaymentsYearMonth AS pym
ON mtr.YYYYMM>=pym.StartYYYYMM
AND mtr.YYYYMM<=pym.EndYYYYMM
...wrap that in an aggregation query to count up the (now unique) [Student ID] values
SELECT
YearNumber,
MonthNumber,
School,
[School Service Type],
COUNT(*) AS CountOfStudents
FROM
(
SELECT DISTINCT
mtr.YearNumber,
mtr.MonthNumber,
pym.[Student ID],
pym.School,
pym.[School Service Type]
FROM
MonthsToReport AS mtr
INNER JOIN
PaymentsYearMonth AS pym
ON mtr.YYYYMM>=pym.StartYYYYMM
AND mtr.YYYYMM<=pym.EndYYYYMM
) AS distinctQuery
GROUP BY
YearNumber,
MonthNumber,
School,
[School Service Type]
and then use that as the FROM subquery in our crosstab query
TRANSFORM Nz(First(CountOfStudents),0) AS n
SELECT
YearNumber,
MonthNumber,
School
FROM
(
SELECT
YearNumber,
MonthNumber,
School,
[School Service Type],
COUNT(*) AS CountOfStudents
FROM
(
SELECT DISTINCT
mtr.YearNumber,
mtr.MonthNumber,
pym.[Student ID],
pym.School,
pym.[School Service Type]
FROM
MonthsToReport AS mtr
INNER JOIN
PaymentsYearMonth AS pym
ON mtr.YYYYMM>=pym.StartYYYYMM
AND mtr.YYYYMM<=pym.EndYYYYMM
) AS distinctQuery
GROUP BY
YearNumber,
MonthNumber,
School,
[School Service Type]
) AS countQuery
GROUP BY
YearNumber,
MonthNumber,
School
PIVOT [School Service Type]
returning
YearNumber MonthNumber School ED EZ RR
---------- ----------- ------ -- -- --
2010 1 ABC 1 0 0
2010 1 DEF 1 1 0
2010 2 ABC 2 1 0
2010 2 DEF 1 1 0
2010 3 ABC 2 1 0
2010 3 DEF 1 1 0
2010 4 ABC 2 1 0
2010 4 DEF 1 1 0
2010 5 ABC 2 1 0
2010 5 DEF 1 1 0
2010 6 ABC 2 1 0
2010 6 DEF 1 1 0
2010 7 ABC 2 0 0
2010 7 DEF 1 1 0
2010 8 ABC 2 0 0
2010 8 DEF 1 1 0
2010 8 GHI 2 0 0
2010 9 ABC 2 0 0
2010 9 DEF 0 1 0
2010 9 GHI 1 0 0
2010 10 ABC 2 0 0
2010 10 DEF 0 1 0
2010 10 GHI 1 0 0
2010 11 ABC 2 0 0
2010 11 DEF 0 1 0
2010 11 GHI 1 0 0
2010 12 ABC 2 0 0
2010 12 DEF 0 1 0
2010 12 GHI 1 0 0
2011 1 ABC 2 0 0
2011 1 DEF 0 1 0
2011 1 GHI 1 0 0
2011 2 0 0 2
2011 2 ABC 3 0 0
2011 2 DEF 0 1 0
2011 2 GHI 2 0 0
2011 3 ABC 2 0 0
2011 3 DEF 0 1 0
2011 3 GHI 1 0 0
2011 4 ABC 2 0 0
2011 4 DEF 0 1 0
2011 4 GHI 1 0 0
2011 5 ABC 2 0 0
2011 5 DEF 0 1 0
2011 5 GHI 1 0 0
2011 6 ABC 2 0 0
2011 6 DEF 0 1 0
2011 6 GHI 1 0 0
2011 7 ABC 2 0 0
2011 7 DEF 0 1 0
2011 7 GHI 1 0 0
2011 8 ABC 2 0 0
2011 8 DEF 0 1 0
2011 8 GHI 1 0 0
2011 9 ABC 2 0 0
2011 9 DEF 0 1 0
2011 9 GHI 1 0 0
2011 10 ABC 2 0 0
2011 10 DEF 0 1 0
2011 10 GHI 1 0 0
2011 11 ABC 2 0 0
2011 11 DEF 0 1 0
2011 11 GHI 1 0 0
2011 12 ABC 2 0 0
2011 12 DEF 0 1 0
2011 12 GHI 1 0 0
2012 1 ABC 2 0 0
2012 1 DEF 0 1 0
2012 1 GHI 1 0 0
2012 2 0 0 1
2012 2 ABC 2 0 0
2012 2 DEF 0 1 0
2012 2 GHI 1 0 0
Upvotes: 1