Nick Ryan
Nick Ryan

Reputation: 19

Pyspark : Deleting columns based on sub set of a string

I have a dataframe ; which looks like below

id   1id  id2  ac1  2ac tre tye

I want to delete the columns which contain "id" and "ac" in them and retain the others

How will I achieve this in pyspark?

Tried "select statements" doesn't work

How should I use regexep on column names here?

Upvotes: 0

Views: 133

Answers (1)

blackbishop
blackbishop

Reputation: 32690

Use a simple list comprehension:

  • Using Select

    df.select(*[col(c) for c in df.columns if not("id" in c or "ac" in c)]).show()
    
  • Using Drop

    df.drop(*[c for c in df.columns if "id" in c or "ac" in c]).show()
    

Upvotes: 1

Related Questions