Reputation: 55
based on this question: bigquery update table using LIKE returns "UPDATE/MERGE must match at most one source row for each target row" I came up with a follow-up question that might end up in a complete different solution. That's why I posted a new question rather than a comment.
I am referring to the solution posted by @jon-armstrong. After testing it with different data, there is still the issue that it does not work if there are duplicate rows in 'table1'. Of course this problem comes from the 'GROUP BY' statement - and w/o this, the UPDATE query does not work, resulting in the error message stated in my original question. It doesn't work either, if I 'GROUP' every value, or group nothing as suggested here. I also came up with the idea of using 'PARTITION BY', however, I get a syntax error in BigQuery.
There can be duplicates in my 'table1' (Data) and my mapping table 'table2'. So to make it very precise, this is my goal:
Table1 (data table)
textWithFoundItemInIt | foundItem
-------------------------------------------
hallo Adam |
Bert says hello |
Bert says byebye |
Want to find "Caesar"bdjehg |
Want to find "Caesar"bdjehg |
Want to find "Caesar"again |
Want to find "Caesar"again and also Bert | <== It is no problem, if only MAX()=Caesar or MIN()=Bert name is found.
Want to find "CaesarCaesar"again and again | <== This is no problem, just finding one Caesar is enough
Table2 (mapping table)
mappingItem
------------
Adam
Bert
Caesar
Bert
Caesar
Adam
Expected result
textWithFoundItemInIt | foundItem
--------------------------------------------
hallo Adam | Adam
Bert says hello | Bert
Bert says byebye | Bert
Want to find "Caesar"bdjehg | Caesar
Want to find "Caesar"bdjehg | Caesar
Want to find "Caesar"again | Caesar
Want to find "Caesar"again and also Bert | Caesar [or Bert]
Want to find "CaesarCaesar"again and again | Caesar
It doesn't matter which Adam from Table2 is found and inserted into Table1, they will be the same. So it is even okay if the first Adam will be over written by the second Adam, or if the query just stops to search any further once one Adam is found.
If I execute Jon's 'SELECT' query, it would result in:
textWithFoundItemInIt | foundItem
--------------------------------------------
hallo Adam | Adam
Bert says hello | Bert
Bert says byebye | Bert
Want to find "Caesar"bdjehg | Caesar
Want to find "Caesar"again | Caesar
Want to find "Caesar"again and also Bert | Caesar (if MAX() chosen)
Want to find "CaesarCaesar"again and again | Caesar
It (correctly) omits the second "Want to find "Caesar"bdjehg", but that's unfortunately not what I need.
If it is easier, it would also be okay that in cases that two names are found in one row
textWithFoundItemInIt | foundItem
---------------------------------------------
hallo Adam and Bert | Adam, Bert
Bert says hello to Caesar | Bert, Caesar
or
textWithFoundItemInIt | foundItem1 | foundItem2
---------------------------------------------------------------
hallo Adam and Bert | Adam | Bert
Bert says hello to Caesar | Bert | Caesar
I hope this helps to understand my issue. In easy words: "It's just a mapping with multiple equal rows" ;-)
Thanks a lot :)
Upvotes: 0
Views: 599
Reputation: 55
The SELECT statement
from @Mikhail works very well. But when I put it into the UPDATE statement
, I get my well-known error:
"UPDATE/MERGE must match at most one source row for each target row".
The problem occurs, because the SELECT statement
correctly returns duplicates. A simple solution to this issue is to SELECT DISTINCT
. If done so, there is no error anymore.
If more than one regexp should be found, then this query is helpful:
select textWithFoundItemInIt,
ARRAY_TO_STRING(regexp_extract_all(textWithFoundItemInIt, r'(?i)' || mappingItems), " --- ") AS foundItem
from table1, (select string_agg(mappingItem, '|') mappingItems from table2)
I hope my logic using the DISTINCT
statement is fail-proof and feasible in all cases. If anyone has any remarks, I am very happy for feedback.
Upvotes: 0
Reputation: 173046
Consider below approach
select textWithFoundItemInIt,
regexp_extract(textWithFoundItemInIt, r'(?i)' || mappingItems) foundItem
from table1, (select string_agg(mappingItem, '|') mappingItems from table2)
if applied to sample data in your question - output is
Upvotes: 2