LASheets
LASheets

Reputation: 141

How do I use Query to find % distribution for each row across multiple rows?

I have a large list of dates across multiple years. What I want to be able to do is use QUERY to find the percentage distribution across each individual year.

I have no issue getting the counts in the query structured exactly as I want, but I haven't been able to find a way to get the Query to calculate the percentage.

=QUERY(ARRAYFORMULA({YEAR($A$2:$A),MONTH($A$2:$A)}),"select Col1,count(Col1) where Col1 is not null group by Col1 pivot Col2",0)

This produces a table that looks like this:

1 2 3 4 5 6 7 8 9 10 11 12
2021 128 215 100 215 26 15 281 182 115 210 111 103
2022 215 476 815 121 621 82 121 635 418 574 922 41

What I would like to see is this:

1 2 3 4 5 6 7 8 9 10 11 12
2021 7.5% 12.6% 5.9% 12.6% 1.5% .9% 16.5% 10.7% 6.8% 12.4% 6.5% 6.1%
2022 4.3% 9.4% 16.2% 2.4% 12.3% 1.6% 2.4% 12.6% 8.3% 11.4% 18.3% .8%

Each row (year) sums to 100%. I cannot find a way to either include something within the data section of the query or within the query section to divide those individual counts by the total count for the year. I've tried nesting some queries but that either doesn't work, or I haven't been able to find the correct combination of queries.

Upvotes: 1

Views: 345

Answers (1)

vilc
vilc

Reputation: 333

Use the following formula and format the appropriate month columns as percent:

=QUERY({
QUERY(A2:A,"select year(A), count(A) where A is not null group by year(A) pivot month(A)",0),
QUERY(A2:A,"select count(A) where A is not null group by year(A)",0)},
"select Col1, Col2/Col14, Col3/Col14, Col4/Col14, Col5/Col14, Col6/Col14, Col7/Col14, Col8/Col14, Col9/Col14, Col10/Col14, Col11/Col14, Col12/Col14, Col13/Col14
label Col1 'year', Col2/Col14 '1', Col3/Col14 '2', Col4/Col14 '3', Col5/Col14 '4', Col6/Col14 '5', Col7/Col14 '6', Col8/Col14 '7', Col9/Col14 '8', Col10/Col14 '9', Col11/Col14 '10', Col12/Col14 '11', Col13/Col14 '12'",1)

The first query is a simplified version of what you did, it returns identical data. Query supports builtin scalar functions year() and month(), so no need for the ARRAYFORMULA({YEAR(),MONTH()}) that you used.

The second query simply creates a column with total counts in each year.

The third query takes the result of the two previous queries as input, and divides counts for each month (Col2Col13) by the total count for the year (Col14).

Result for some simple data: Screenshot of the result

Upvotes: 1

Related Questions