Nag
Nag

Reputation: 2057

pyspark - what is the real use of "col" function

I am yet to find the real use of "col" function, so far, I am seeing the same impact with using col or without using col function. Can someone elborate an use case which can only be done with "col" function.

Both return the same result.So, what is the real need of "col" function. I understood from the documentation, it retruns the col type.

employeesDF. \
    select(upper("first_name"), upper("last_name")). \
    show()

employeesDF. \
    select(upper(col("first_name")), upper(col("last_name"))). \
    show()

Upvotes: 0

Views: 685

Answers (1)

MikeK
MikeK

Reputation: 41

In some cases the functions take column names aka strings as input or column types for example as you have above in select. A select is always going to return a dataframe of columns so supporting both input types makes sense. It is much more common to select using just the column name however.

In many situations though there is a big difference between (String) columnName and col(string) and you have to be explicit. For example say you have something like

when(col("my_col").isNull()).otherwise("other_col")

In that expression you would be returning the literal string "other_col" when "my_col" is null instead of the value from "other_col".

Upvotes: 3

Related Questions