Reputation: 17
I would like to ask for your help with the power query-related problem I cannot solve on my own:
One of the columns [address] contains data like this :
Science Park 404, 1098 XH Amsterdam, Netherlands
Laan van Malkenschoten 20, 7333 NP Apeldoorn, NL
plac Europejski 1, 00-844 Warszawa, Poland
Adrianastraat 6, 3014 XL Rotterdam, Netherlands
C. Co Brava, 109, 17411 Vidreres, Girona, Spain
I also have a list of keywords, that contain: Amsterdam, Apeldoorn, Rotterdam
What I need is that power query checks if [address] column contains any of the keywords, and if yes, in a new custom column display ‘MATCH’:
Science Park 404, 1098 XH Amsterdam, Netherlands MATCH
Laan van Malkenschoten 20, 7333 NP Apeldoorn, NL MATCH
plac Europejski 1, 00-844 Warszawa, Poland
Adrianastraat 6, 3014 XL Rotterdam, Netherlands MATCH
C. Co Brava, 109, 17411 Vidreres, Girona, Spain
The list of keywords is stored in the same Excel file, in a one-column table called ‘CITIES_NL’
In theory, I can create a conditional column and hard-code the keywords, however I cannot do it for two reasons:
To achieve the same in Excel, without PQ, I would use this formula:
=IF(--SUMPRODUCT(--ISNUMBER(SEARCH(CITIES_NL,address)))>0,"MATCH","")
however, this has to be done by PQ, as this is only one of the steps for preparing my data
I would appreciate your help!
Michal
Upvotes: 1
Views: 1435
Reputation: 3257
Firstly, add both your ADDRESS
and CITIES_NL
table to power query editor, convert CITIES_NL
into a List, then add a custom column to the ADDRESS
table using the following formula:
=if List.Count(Splitter.SplitTextByAnyDelimiter(CITIES_NL)([Column1]))>1 then "MATCH" else null
The logic is to split
ADDRESS
by any words contained in theCITIES_NL
list, and count how many sub-strings are there. If the result is greater than1
which means at least one key word was found, then use IF to returnMATCH
otherwise(null)
.
Replace [Column1]
with the actual column name in your case.
Here is the full Power Query M code for your reference:
let
Source = Excel.CurrentWorkbook(){[Name="ADDRESS"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each if List.Count(Splitter.SplitTextByAnyDelimiter(CITIES_NL)([Column1]))>1 then "MATCH" else "")
in
#"Added Custom"
Please note the match will be case sensitive. If you want a case-insensitive match, you can add an index column to the
ADDRESS
table, make a duplicate of theADDRESS
query, convert bothADDRESS
andCITIES_NL
to lowercase then do the match, and then merge the matched query with the originalADDRESS
query by the index number.
Let me know if you have any questions. Cheers :)
Upvotes: 1
Reputation: 40204
You can do this dynamically by adding a custom column that for each address checks each keyword to see if it is a substring and counts the number of keywords that are a substring of the address. If the count > 0
, then return "MATCH"
.
Here's one way to do this:
= Table.AddColumn(#"Previous Step Reference", "Custom Column Name",
(C) => if List.Count(
List.Select(CITIES_NL[keyword], each Text.Contains(C[address], _))
) > 0
then "MATCH"
else null,
type text)
The syntax is a bit different but the logic is very similar to your Excel formula.
Note that I defined a context C
so that I could reference the address
from that row context within the context of the CITIES_NL[keyword]
list.
Upvotes: 1