Reputation: 175
Situation: I am working in Power Query with data that was imported from a pdf file, and it came in a bit messy. I have a column that contains numbers, as well as text strings. Some of the text strings are mixed-case, containing both upper and lower case characters, while others contain only upper case characters.
Goal: I want to remove all numbers and all mixed-case text strings. The end result should show only the text strings that are completely upper case.
For example, I would want my end result to include things like IRA, IRREVOCABLE TRUST, CHARITABLE TRUST, but replace things like Number of Accounts, Totals, 14 with null.
What I have tried so far:
The following gets rid of numbers and lower case characters, but it doesn't quite work since it keeps upper case characters included in mixed-case character strings.
Table.AddColumn(#"Added Custom2", "Account Type" each Text.Select([AccountType], {"A".."Z"," "}), type text)
The following code gets rid of the mixed-case text strings, but it doesn't quite work because it doesn't remove the numbers. Also, it is too specific, requiring me to remove strings that contain specific words. I'd prefer to remove all strings that contain lower case characters.
Table.AddColumn(#"Added Custom2", "Account Type", each if [AccountType]= null or Text.Contains([AccountType],"Totals") or Text.Contains([AccountType],"of") or Text.Contains([AccountType],"report") then null else [AccountType])
Your insights would be appreciated. I'm a new PowerQuery user, so please be specific and detailed with your responses.
Upvotes: 0
Views: 2175
Reputation: 60494
Hard to know exactly what you want since you provide no examples of your data.
#"Added Custom" = Table.AddColumn(#"Changed Type", "allCaps", each
Text.Combine(
List.Accumulate(Text.Split([Column1]," "),
{},
(state, current)=>
if List.ContainsAny(
Text.ToList(current),
{"0".."9","a".."z",",",":","?","/","\"," "})
then state
else state & {current}),", "))
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each
if List.ContainsAny(
Text.ToList([Column1]),
{"0".."9","a".."z",",",":","?","/","\"," "})
then null
else [Column1])
Then you can filter by deselecting the null in the added column
In each case, #"Changed Type"
is the previous step. If that is not the case in your code, replace with the actual name of your previous step
Upvotes: 1
Reputation: 21428
First formula just looks for all capitals that dont contain a number
= if [Column1] = Text.Remove ([Column1],{"0".."9","a".."z"}) then [Column1] else null
Second formula removes all numbers, THEN looks for all capitals that dont contain a number
= if Text.Remove ([Column1],{"0".."9"}) = Text.Remove ([Column1],{"a".."z","0".."9"}) then Text.Remove ([Column1],{"0".."9"}) else null
let Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each if [Column1] = Text.Remove ([Column1],{"0".."9","a".."z"}) then [Column1] else null),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "Custom2", each if Text.Remove ([Column1],{"0".."9"}) = Text.Remove ([Column1],{"a".."z","0".."9"}) then Text.Remove ([Column1],{"0".."9"}) else null)
in #"Added Custom1"
~ ~ ~
If you are looking to parse words out of a list of words.......
This retains words that (a) have no numbers before after or within, and (b) are all in capitals
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each
Text.Combine(
List.RemoveNulls(
List.Transform(Text.Split([Column1]," "), each
if _ = Text.Remove (_,{"a".."z","0".."9"}) then _ else null
))," "))
in #"Added Custom"
This removes all numbers, and after that retains words that are all in capitals
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each
Text.Combine(
List.RemoveNulls(
List.Transform(Text.Split(Text.Remove ([Column1],{"0".."9"})," "), each
if _ = Text.Remove (_,{"a".."z"}) then _ else null
))," "))
in #"Added Custom"
Upvotes: 0