Reputation: 19
I have a dataframe ; which looks like below
id 1id id2 ac1 2ac tre tye
I want to delete the columns which contain "id"
and "ac"
in them and retain the others
How will I achieve this in pyspark?
Tried "select statements" doesn't work
How should I use regexep on column names here?
Upvotes: 0
Views: 133
Reputation: 32690
Use a simple list comprehension:
Using Select
df.select(*[col(c) for c in df.columns if not("id" in c or "ac" in c)]).show()
Using Drop
df.drop(*[c for c in df.columns if "id" in c or "ac" in c]).show()
Upvotes: 1