Qqqqq
Qqqqq

Reputation: 175

Power Query - Remove text strings that contain lower case letters

Situation: I am working in Power Query with data that was imported from a pdf file, and it came in a bit messy. I have a column that contains numbers, as well as text strings. Some of the text strings are mixed-case, containing both upper and lower case characters, while others contain only upper case characters.

Goal: I want to remove all numbers and all mixed-case text strings. The end result should show only the text strings that are completely upper case.

For example, I would want my end result to include things like IRA, IRREVOCABLE TRUST, CHARITABLE TRUST, but replace things like Number of Accounts, Totals, 14 with null.

What I have tried so far:

  1. The following gets rid of numbers and lower case characters, but it doesn't quite work since it keeps upper case characters included in mixed-case character strings.

    Table.AddColumn(#"Added Custom2", "Account Type" each Text.Select([AccountType], {"A".."Z"," "}), type text)

  2. The following code gets rid of the mixed-case text strings, but it doesn't quite work because it doesn't remove the numbers. Also, it is too specific, requiring me to remove strings that contain specific words. I'd prefer to remove all strings that contain lower case characters.

    Table.AddColumn(#"Added Custom2", "Account Type", each if [AccountType]= null or Text.Contains([AccountType],"Totals") or Text.Contains([AccountType],"of") or Text.Contains([AccountType],"report") then null else [AccountType])

Your insights would be appreciated. I'm a new PowerQuery user, so please be specific and detailed with your responses.

Upvotes: 0

Views: 2175

Answers (2)

Ron Rosenfeld
Ron Rosenfeld

Reputation: 60494

Hard to know exactly what you want since you provide no examples of your data.

  • If you have multiple (space-separated) strings in a cell, then you can use:
   #"Added Custom" = Table.AddColumn(#"Changed Type", "allCaps", each 
        Text.Combine(
            List.Accumulate(Text.Split([Column1]," "),
                            {},
                            (state, current)=>  
                                if List.ContainsAny(
                                        Text.ToList(current),
                                            {"0".."9","a".."z",",",":","?","/","\"," "}) 
                                then state 
                                else state & {current}),", "))
  • If you just have a single string in a cell, then you can use
  #"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each 
        if List.ContainsAny(
                Text.ToList([Column1]),
                {"0".."9","a".."z",",",":","?","/","\"," "}) 
        then null 
        else [Column1])

Then you can filter by deselecting the null in the added column

In each case, #"Changed Type" is the previous step. If that is not the case in your code, replace with the actual name of your previous step

Upvotes: 1

horseyride
horseyride

Reputation: 21428

First formula just looks for all capitals that dont contain a number

= if [Column1] = Text.Remove ([Column1],{"0".."9","a".."z"}) then [Column1] else null

Second formula removes all numbers, THEN looks for all capitals that dont contain a number

= if Text.Remove ([Column1],{"0".."9"}) = Text.Remove ([Column1],{"a".."z","0".."9"}) then Text.Remove ([Column1],{"0".."9"}) else null

enter image description here

let  Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each if [Column1] = Text.Remove ([Column1],{"0".."9","a".."z"}) then [Column1] else null),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "Custom2", each if Text.Remove ([Column1],{"0".."9"}) = Text.Remove ([Column1],{"a".."z","0".."9"}) then Text.Remove ([Column1],{"0".."9"}) else null)
in  #"Added Custom1"

~ ~ ~

If you are looking to parse words out of a list of words.......

This retains words that (a) have no numbers before after or within, and (b) are all in capitals

enter image description here

let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each 
Text.Combine(
    List.RemoveNulls(
        List.Transform(Text.Split([Column1]," "), each 
        if  _ = Text.Remove (_,{"a".."z","0".."9"}) then _ else null
        ))," "))
in #"Added Custom"

This removes all numbers, and after that retains words that are all in capitals

enter image description here

let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each 
Text.Combine(
    List.RemoveNulls(
        List.Transform(Text.Split(Text.Remove ([Column1],{"0".."9"})," "), each 
        if  _ = Text.Remove (_,{"a".."z"}) then _ else null
        ))," "))
in #"Added Custom"

Upvotes: 0

Related Questions