DespicableMe
DespicableMe

Reputation: 61

Use Regex to filter Columns (by name) of a PySpark dataframe

I have a Spark dataframe with 3k-4k columns and I'd like to drop columns where the name meets certain variable criteria ex. Where ColumnName Like 'foo'.

Upvotes: 4

Views: 8122

Answers (1)

Mariusz
Mariusz

Reputation: 13926

To get a column names you use df.columns and drop() supports dropping many columns in one call. The below code uses these two and does what you need:

condition = lambda col: 'foo' in col
new_df = df.drop(*filter(condition, df.columns))

Upvotes: 8

Related Questions