Gibson
Gibson

Reputation: 2085

Chartkick group and order by at the same time

I have a working chart in chartkick but months are not showing in order

(October -> March -> February)

I need them in a correct order

(February -> March -> October )

My schema

  create_table "business_data", force: :cascade do |t|
    t.integer "net_sales"
    t.text "next_steps"
    t.datetime "created_at", null: false
    t.datetime "updated_at", null: false
    t.string "month_name"
    t.date "start_date"
    t.date "end_date"
  end

Controller

@business_data = BusinessDatum.all.order(start_date: :asc)

View

<%= line_chart @business_data.group(:month_name).sum(:net_sales) %>

By adding the .order I get the following error:

PG::GroupingError: ERROR:  column "business_data.start_date" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: ...a" GROUP BY "business_data"."month_name" ORDER BY "business_...

¿How can I order the months in the correct order?

Upvotes: 1

Views: 367

Answers (2)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 656586

Like the error msg points out, in a SELECT query with aggregate functions, ORDER BY only allows references to grouped or aggregated columns. Other references would be ambiguous. But - as a quick fix - you can work with an expression based on the grouped column month_name:

SELECT month_name, sum(net_sales)
FROM   business_data
GROUP  BY month_name
ORDER  BY to_date(month_name, 'Month');  -- date derived from month name

If start_date is guaranteed to lie within month_name, you can also:

SELECT month_name, sum(net_sales)
FROM   business_data
GROUP  BY date_trunc('month', start_date), month_name
ORDER  BY date_trunc('month', start_date);

This also enforces the same year (like you probably want).

Storing the name of the month is not very useful to begin with. (What about the year anyway?)

If possible, fix your relational design to store a date instead. Or, if start_date always lies within month_name, simply drop the column month_name. It's just redundant ballast. Don't store redundant data. You can derive the name of the month from start_date cheaply with to_char():

SELECT to_char(start_date, 'Month') AS month_name

However, to group and order by month, throw in date_trunc() like above:

SELECT to_char(date_trunc('month', start_date), 'Month') AS month_name, sum(net_sales)
FROM   business_data
GROUP  BY date_trunc('month', start_date)
ORDER  BY date_trunc('month', start_date);

db<>fiddle here

Upvotes: 2

Ankit
Ankit

Reputation: 6980

tl;dr

You can try finding the aggregates sales per month_name, and then sort them

Here is what that looks like

<%= line_chart BusinessDatum.all.group(:month_name).sum(:net_sales).sort_by {|s| Date::MONTHNAMES.index(s[0])} %>

Note: I put the whole query on a single line, but that should work even if you break it between controller and action (or preferably move completely into the controller action)

Longer Discussion:

Based on the question, I am going to guess that there is a conceptual misunderstanding about how ActiveRecord/Postgres is executing your query. If you look at the original query

BusinessDatum.all.order(start_date: :asc).group(:month_name).sum(:net_sales)

the SQL that this expression would generate is

; SQL generated by AR
SELECT 
  SUM("net_sales"), "month_name"
FROM 
  "business_data" 
GROUP BY 
  "month_name" 
ORDER BY 
  "start_date" ASC

I am gonna guess that you were expecting Postgres to first order the data by start_date and then do a group_by (assuming that if grouping is applied to the ordered data, the aggregations will be ordered as well). However, that is not how Postgres works.

In fact, if we change the query to have order by before group by, it would be an invalid query. The following is an invalid query

; this query is invalid because the order of order by and group by is not correct
SELECT 
  SUM("net_sales"), "month_name"
FROM 
  "business_data" 
ORDER BY 
  "start_date" ASC
GROUP BY 
  "month_name"

Let's dig a little more. If you look at the SQL generated by AR, the error that you are seeing gets clear. Postgres is going to take the data in the table, group the data by month, and then sum the net_sales. When it does that the outcome will contain only the sum(net_sales) and month_name. The data will look something like this

sum(net_sales) month_name
100 October
200 February
300 March

As you can see, the grouped data does not contain the start_date. In fact, it probably would not make sense to have a single start_date here as the start_date column can contain lots of values. This is why the error you saw gave you two options to fix:

  1. Either use start_date in the group by clause (so that the start_date is part of the grouped data (which will then allow you to order by start_date. If you add start_date in the group clause though, the data will look something like this (the month_name and start_date are no longer unique in the grouped data
sum(net_sales) month_name start_date
50 October Oct 1, 2022
50 October Oct 2, 2022
200 February Feb 1, 2022
300 March March 5, 2022
  1. use the start_date in the aggregate function (such as max(start_date) )

I don't think either of the two suggestions makes sense in this case.

Note, that the aggregated data does not guarantee that the data is ordered by month_name by default. You can try order by month_name. However, I do not think that will give you data in the order you want because order by month_name will use the alphabetic ordering of months.

So, one solution would be to generate the aggregate on the database as follows:

# you do not need the "all" here; it should work without it
@business_data = BusinessDatum.all.group(:month_name).sum(:net_sales)`

The generated data will contain the month_name (as strings). The output of the above expression will be a hash

{"March" => 300, "February" => 200, "October" => 300}

You can then sort this data on the application server (i.e. within rails)

@business_data.sort_by {|s| Date::MONTHNAMES.index(s[0])}

This will generate the data in the calendar month order

[ ["February", 200], ["March", 300], ["October", 100] ]

Here the sort method converted the hash to an array while sorting, but that is okay (probably preferable over hash as arrays communicate ordering) since chartkick can take array or hash as arguments for line chart. If, for some reason, you do want a hash, you can convert it to hash by using the to_h function on the array.

Caveat: One assumption that I made here is that Business Data can have multiple entries for a given month, where the start_date for entries in a given month, may or may not be unique.

Upvotes: 1

Related Questions