Lynchie
Lynchie

Reputation: 1149

Drop columnns from dataframe where last three characters equal

I wish to only select particular columns from a dataframe, however the columns I don't want all end with "Nav".

How can I accomplish this?

I've tried something similar to the below

jsonDF2.select([c for c in jsonDF2.columns if c not in {'%Nav'}])

Any advice would be appreciated.

UPDATE

Currently using

#jsonDF2 = jsonDF2.select("d.*")

because I'm exploding some JSON that is nested in "d." so using blackbishops code, it currently places all the JSON within one column, instead of multiples screenshots of example below:

Code Used & Result:

jsonDF2 = jsonDF2.select("d.*")

enter image description here

Suggested Code

jsonDF2.select(*[F.col(c) for c in jsonDF2.columns if not c.endswith("Nav")])

enter image description here

I've tried place the d. prior the the * in the suggested code but got no joy. I know that "F" is obviously for the import. I tried also place the d. before the "c" and got no joy as well.

Upvotes: 0

Views: 166

Answers (3)

Shubham Sharma
Shubham Sharma

Reputation: 71687

DataFrame.colRegex

You can use colRegex which is available in spark >= 2.3

df.select(df.colRegex('`.*(?<!nav)`'))

Upvotes: 0

Nohman
Nohman

Reputation: 454

This should do it

[c for c in jsonDF2.columns if c[-3:] != 'Nav']

Upvotes: 0

blackbishop
blackbishop

Reputation: 32690

Try this:

from pyspark.sql import functions as F

jsonDF2.select(*[F.col(c) for c in jsonDF2.columns if not c.endswith("Nav")])

Upvotes: 1

Related Questions