Pyspark : Deleting columns based on sub set of a string

I have a dataframe ; which looks like below

id   1id  id2  ac1  2ac tre tye

I want to delete the columns which contain "id" and "ac" in them and retain the others

How will I achieve this in pyspark?

Tried "select statements" doesn't work

How should I use regexep on column names here?

Upvotes: 0

Answers (1)

Reputation: 32690

Use a simple list comprehension:

Using Select

df.select(*[col(c) for c in df.columns if not("id" in c or "ac" in c)]).show()

Using Drop

df.drop(*[c for c in df.columns if "id" in c or "ac" in c]).show()

Upvotes: 1