Mapping Spark dataframe columns with special characters

Question

I have the following schema after executing df.printSchema()

root
 |-- key:col1: string (nullable = true)
 |-- key:col2: string (nullable = true)
 |-- col3: string (nullable = true)
 |-- col4: string (nullable = true)
 |-- col5: string (nullable = true)

I need to access the key:col2 using the column name but the following line gives an error due to the : within the name

df.map(lambda row:row.key:col2)

I have tried

df.map(lambda row:row["key:col2"])

I can easily obtain values from col3, col4 and col5 using

df.map(lambda row:row.col4).take(10)

mgilson · Accepted Answer

I think you can probably use getattr:

df.map(lambda row: getattr(row, 'key:col2'))

I'm not an expert in pyspark, so I don't know if this is the best way or not :-).

You might also be able to use operator.attrgetter:

from operator import attrgetter
df.map(attrgetter('key:col2'))

IIRC, it performs slightly better than lambda in some situations. This is probably more pronounced in this case than usual because it can avoid the global getattr name lookup, and in this case, I think it looks a bit nicer too.

Mapping Spark dataframe columns with special characters

Answers (1)

Related Questions