Reputation: 1149
I wish to only select particular columns from a dataframe, however the columns I don't want all end with "Nav"
.
How can I accomplish this?
I've tried something similar to the below
jsonDF2.select([c for c in jsonDF2.columns if c not in {'%Nav'}])
Any advice would be appreciated.
UPDATE
Currently using
#jsonDF2 = jsonDF2.select("d.*")
because I'm exploding some JSON that is nested in "d." so using blackbishops code, it currently places all the JSON within one column, instead of multiples screenshots of example below:
Code Used & Result:
jsonDF2 = jsonDF2.select("d.*")
Suggested Code
jsonDF2.select(*[F.col(c) for c in jsonDF2.columns if not c.endswith("Nav")])
I've tried place the d. prior the the * in the suggested code but got no joy. I know that "F" is obviously for the import. I tried also place the d. before the "c" and got no joy as well.
Upvotes: 0
Views: 166
Reputation: 71687
DataFrame.colRegex
You can use colRegex
which is available in spark >= 2.3
df.select(df.colRegex('`.*(?<!nav)`'))
Upvotes: 0
Reputation: 32690
Try this:
from pyspark.sql import functions as F
jsonDF2.select(*[F.col(c) for c in jsonDF2.columns if not c.endswith("Nav")])
Upvotes: 1