Reputation: 9441
I have keys that may change slightly in both tables, because they were both independently entered via free text. Any human eye would spot that they are the same, but the tables are too big to created regular expression rules for each type of mismatch.
For example, these should match:
This Is Key One
and This is Key one
Subject:the Description
and Subject: the Description
I am key two!
and I am key two
How do I create such a query? (if it's even possible in BQ)
Upvotes: 1
Views: 328
Reputation: 19835
first you need to identify every single possible way that your keys can be similar. for example one is ignoring case, another is removing special characters like "!". you can make a user-defined function that handles all such cases to "clean" or "standarize" the keys.
create a new clean table using the custom function and query on the clean table (and always clean the keys before appending more rows from then on)
https://cloud.google.com/bigquery/user-defined-functions
Upvotes: 1
Reputation: 161
An alternative to using a user-defined function as Zig suggests would be to use the various BigQuery string functions to clean your data.
Whatever method you use, though, you'll need to do it either in a subquery prior to the join or you'll have to do it as a preliminary step before your query. BigQuery at this time only support join predicates which are conjunctions of equality comparisons (i.e., a.key1 = b.key1 AND a.key2 = b.key2 etc.).
Upvotes: 3