drop all df2.columns from another df (pyspark.sql.dataframe.DataFrame specific)

Question

I have a large DF (pyspark.sql.dataframe.DataFrame) that is a result of multiple joins, plus new columns being created by using a combination of inputs from different DFS, including DF2.

I want to drop all DF2 columns from DF after I'm done with the join/creating new columns based on DF2 input. drop() doesn't accept list - only a string or a Column.

I know that df.drop("col1", "col2", "coln") will work but I'd prefer not to crowd the code (if I can) by listing those 20 columns.

Is there a better way of doing this in pyspark dataframe specifically?

过过招 · Accepted Answer

drop_cols = df2.columns
df = df.drop(*drop_cols)

drop all df2.columns from another df (pyspark.sql.dataframe.DataFrame specific)

Answers (1)

Related Questions