Lilienthal
Lilienthal

Reputation: 4378

Remove columns by name based on pattern

How can I remove a large number of columns by name based on a pattern?

A data set exported from Jira has a ton of extra columns that I've no interest in. 400 Log entries, 50 Comments, dozens of links or attachments. Problem is that they get random numbers assigned which means that removing them with hardcoded column names will not work. That would look like this and break as the numbers change:

= Table.RemoveColumns(#"Previous Step",{"Watchers", "Watchers_10", "Watchers_11", "Watchers_12", "Watchers_13", "Watchers_14", "Watchers_15", "Watchers_16", "Watchers_17", "Watchers_18", "Watchers_19", "Watchers_20", "Watchers_21", "Watchers_22", "Watchers_23", "Watchers_24", "Watchers_25", "Watchers_26", "Watchers_27", "Watchers_28", "Log Work", "Log Work_29", "Log Work_30", "Log Work_31", "Log Work_32", ...

How can I remove a large number of columns by using a pattern in the name? i.e. remove all "Log Work" columns.

Upvotes: 1

Views: 1379

Answers (1)

Lilienthal
Lilienthal

Reputation: 4378

The best way I've found is to use List.FindText on Table.ColumnNames to get a list of column names dynamically based on target string:

= Table.RemoveColumns(#"Previous Step", List.FindText(Table.ColumnNames(#"Previous Step"), "Log Work")

This works by first grabbing the full list of Column Names and keeping only the ones that match the search string. That's then sent to RemoveColumns as normal.

Limitation appears to be that FindText doesn't offer complex pattern matching.

Of course, when you want to remove a lot of different patterns, having individual steps isn't very interesting. A way to combine this is to use List.Combine to join the resulting column names together.

That becomes:

= Table.RemoveColumns(L, List.Combine({ List.FindText(Table.ColumnNames(L), "Watchers_"), List.FindText(Table.ColumnNames(L), "Log Work"), List.FindText(Table.ColumnNames(L), "Comment"), List.FindText(Table.ColumnNames(L), "issue link"), List.FindText(Table.ColumnNames(L), "Attachment")} ))

SO what's actually written there is:

Table.RemoveColumns(PreviousStep, List.Combine({ foundList1, foundlist2, ... }))

Note the { } that signifies a list! You need to use this as List.Combine only accepts a single argument which is itself already a List of lists. And the Combine call is required here.

Also note the L here instead of #"Previous Step". That's used to make the entire thing more readable. Achieved by inserting a step named "L" that just has = #"Promoted Headers".

This allows relatively maintainable removal of multiple columns by name, but it's far from perfect.

Upvotes: 2

Related Questions