Mohseen Mulla
Mohseen Mulla

Reputation: 602

How to convert all the contents of a list to individual strings to pass as a parameter in dataframe.select()?

My requirement is to pass the list of all column names as a parameter in dataframe.select(columns)

  1. First of all I am using dataframe.columns to get me the list of all the columns. Which I am storing as list1=(dataframe.columns)

  2. Secondly I am using .join to convert my list of columns in strings like - (strList1 = "','".join(list1)) which gives me the following output

    col1','col2','col3','col4

    As you see there is some missing single quotes at the start & end of the string

  3. To rectify strList1 i use f-string formatting as follows - strList2 = f"{strList1}'"which gives me the following output

    'col1','col2','col3','col4'

The main problem:

When you pass strList2 as a parameter in dataframe.select(strList2) gives me the following error

Py4JJavaError: An error occurred while calling o5813.select. : org.apache.spark.sql.AnalysisException: cannot resolve 'backquotecol1','col2','col3','col4'backquote' given input columns: [ col1, col2, db.table.col4, db.table.col3];; 'Project ['col1','col2','col3','col4']

Note - There is a backquote before col1 and after col4

I don't understand why the select function isn't taking a proper string value / variable (strList2)

Upvotes: 1

Views: 1005

Answers (1)

mck
mck

Reputation: 42422

You are passing a string "'col1','col2','col3','col4'" to df.select. There is no such column called "'col1','col2','col3','col4'", therefore df.select("'col1','col2','col3','col4'") will result in an error.

Instead of passing that string, what you need to do is to pass in a list of strings of column names to df.select. df.columns is already that list (e.g. ['col1', 'col2']), so you can simply do df.select(df.columns).

In fact, if you simply want to show all columns, you can do df.select('*').

Upvotes: 1

Related Questions