David542
David542

Reputation: 110257

Google BigQuery case insensitive match

How would I run the following query, like I would again mysql:

SELECT * FROM [integrated-myth-15.testing_data_100k]
WHERE title='down in la'

Now it will match if I have a perfect case-sensitive string match, but how would I do it case insensitive for "down in la"? I'm working from the Web console.

Upvotes: 20

Views: 42575

Answers (3)

Luke Puplett
Luke Puplett

Reputation: 45135

Excuse me if this is way off. I have not used the product, I'm reading the docs to research it.

I've found the following which might be of use.

CONTAINS_SUBSTR

Performs a normalized, case-insensitive search to see if a value exists as a substring in an expression. Returns TRUE if the value exists, otherwise returns FALSE.

https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#contains_substr

This is interesting because the case sensitivity seems to be built-in to the function, which tells me that there may be others which work this way, and that it'll just work the way most people would expect it to :)

COLLATE

Also, I wonder whether you can apply a collation at query time to help.

https://cloud.google.com/bigquery/docs/reference/standard-sql/collation-concepts#collate_define

-- Assume there is a table with this column declaration:
CREATE TABLE table_a
(
    col_a STRING COLLATE 'und:ci',
    col_b STRING COLLATE '',
    col_c STRING,
    col_d STRING COLLATE 'und:ci'
);

-- This runs. Column 'b' has a collation specification and the
-- column 'c' does not.
SELECT STARTS_WITH(col_b_expression, col_c_expression)
FROM table_a;

-- This runs. Column 'a' and 'd' have the same collation specification.
SELECT STARTS_WITH(col_a_expression, col_d_expression)
FROM table_a;

-- This runs. Even though column 'a' and 'b' have different
-- collation specifications, column 'b' is considered the default collation
-- because it's assigned to an empty collation specification.
SELECT STARTS_WITH(col_a_expression, col_b_expression)
FROM table_a;

-- This works. Even though column 'a' and 'b' have different
-- collation specifications, column 'b' is updated to use the same
-- collation specification as column 'a'.
SELECT STARTS_WITH(col_a_expression, COLLATE(col_b_expression, 'und:ci'))
FROM table_a;

-- This runs. Column 'c' does not have a collation specification, so it uses the
-- collation specification of column 'd'.
SELECT STARTS_WITH(col_c_expression, col_d_expression)
FROM table_a;

Upvotes: 2

Anshul Gupta
Anshul Gupta

Reputation: 59

The best way is to append "IGNORE CASE" at the end of your query.

SELECT * FROM [integrated-myth-15.testing_data_100k] WHERE title='down in la' IGNORE CASE

Note: This will work only with legacy SQL

As mentioned in the official documentations:

String functions operate on string data. String constants must be enclosed with single or double quotes. String functions are case-sensitive by default. You can append IGNORE CASE to the end of a query to enable case- insensitive matching. IGNORE CASE works only on ASCII characters and only at the top level of the query.

Upvotes: 4

Elliott Brossard
Elliott Brossard

Reputation: 33745

The standard way to do this is using LOWER or UPPER on the input string, e.g.:

#legacySQL
SELECT * FROM [integrated-myth-15.testing_data_100k]
WHERE LOWER(title) = 'down in la';

Or:

#standardSQL
SELECT * FROM `integrated-myth-15.testing_data_100k`
WHERE LOWER(title) = 'down in la';

Upvotes: 41

Related Questions