Igor Hatarist
Igor Hatarist

Reputation: 5442

Optimize two COUNT/DISTINCT queries

I have a table with column1 and column2 (both of them contain TEXT). I want to get:

1) the count of unique rows of column1@table, case-insensitive

2) the count of unique rows of column1@table and column2@table, case-insensitive

SELECT count(*) AS unique_row1 FROM (SELECT DISTINCT lower(column1) FROM table);

SELECT count(*) AS unique_rows12 FROM (SELECT DISTINCT lower(column1),lower(column1) FROM table);

Is there a more efficient way to do it? Is there's a way to do it in a one query? I use SQLite3.

Thanks in advance.

Edit (due to @ypercube's response): Collation was default (case-sensitive, at least I didn't do COLLATE NOCASE anywhere).

Also I've made a test and with COLLATE NOCASE it's quite faster and the numbers are the same:

# time echo "SELECT count(DISTINCT lower(column1)), count(DISTINCT lower(column1 || column2)) FROM table;" | sqlite3 db.sqlite3
1643|5997
echo   0.00s user 0.00s system 25% cpu 0.003 total
sqlite3 db.sqlite3  0.58s user 0.04s system 96% cpu 0.643 total

# time echo "SELECT count(DISTINCT column1), count(DISTINCT column1 || column2) FROM table;" | sqlite3 db.sqlite3              
1658|6199
echo   0.00s user 0.00s system 36% cpu 0.002 total
sqlite3 db.sqlite3  0.42s user 0.04s system 95% cpu 0.483 total

# time echo "SELECT count(DISTINCT column1 COLLATE NOCASE), count(DISTINCT (column1 || column2) COLLATE NOCASE) FROM table;" | sqlite3 db.sqlite3
1643|5997
echo   0.00s user 0.00s system 32% cpu 0.002 total
sqlite3 db.sqlite3  0.43s user 0.04s system 98% cpu 0.481 total

COUNT(DISTINCT column1, column2) shows an error though: wrong number of arguments to function count(), but I hope I've got your idea.

Upvotes: 2

Views: 1449

Answers (2)

ypercubeᵀᴹ
ypercubeᵀᴹ

Reputation: 115550

I think that text comparisons are by default case-insensitive. Does this work?

SELECT COUNT(DISTINCT column1)
     , COUNT(DISTINCT column1, column2)
FROM table

Upvotes: 1

DavidEG
DavidEG

Reputation: 5957

You can try this:

SELECT count(distinct lower(column1))
  FROM table

And for the second:

SELECT count(distinct lower(column1 || column2))
  FROM table

Note: In this second case you must use coalesce if your columns can be nulls.

Upvotes: 3

Related Questions