Reputation: 602
My requirement is to pass the list of all column names as a parameter in dataframe.select(columns)
First of all I am using dataframe.columns
to get me the list of
all the columns. Which I am storing as list1=(dataframe.columns)
Secondly I am using .join to convert my
list of columns in strings like - (strList1 = "','".join(list1)
) which gives me the following output
col1','col2','col3','col4
As you see there is some missing single quotes at the start & end of the string
To rectify strList1
i use f-string formatting as follows -
strList2 = f"{strList1}'"
which gives me the following output
'col1','col2','col3','col4'
The main problem:
When you pass strList2
as a parameter in dataframe.select(strList2)
gives me the following error
Py4JJavaError: An error occurred while calling o5813.select. : org.apache.spark.sql.AnalysisException: cannot resolve 'backquote
col1','col2','col3','col4'
backquote' given input columns: [ col1, col2, db.table.col4, db.table.col3];; 'Project ['col1','col2','col3','col4']
Note - There is a backquote before col1 and after col4
I don't understand why the select
function isn't taking a proper string value / variable (strList2
)
Upvotes: 1
Views: 1005
Reputation: 42422
You are passing a string "'col1','col2','col3','col4'"
to df.select
. There is no such column called "'col1','col2','col3','col4'"
, therefore df.select("'col1','col2','col3','col4'")
will result in an error.
Instead of passing that string, what you need to do is to pass in a list of strings of column names to df.select
. df.columns
is already that list (e.g. ['col1', 'col2']
), so you can simply do df.select(df.columns)
.
In fact, if you simply want to show all columns, you can do df.select('*')
.
Upvotes: 1