Reputation: 357
I have the following schema after executing df.printSchema()
root
|-- key:col1: string (nullable = true)
|-- key:col2: string (nullable = true)
|-- col3: string (nullable = true)
|-- col4: string (nullable = true)
|-- col5: string (nullable = true)
I need to access the key:col2 using the column name but the following line gives an error due to the : within the name
df.map(lambda row:row.key:col2)
I have tried
df.map(lambda row:row["key:col2"])
I can easily obtain values from col3, col4 and col5 using
df.map(lambda row:row.col4).take(10)
Upvotes: 1
Views: 1608
Reputation: 310227
I think you can probably use getattr
:
df.map(lambda row: getattr(row, 'key:col2'))
I'm not an expert in pyspark
, so I don't know if this is the best way or not :-).
You might also be able to use operator.attrgetter
:
from operator import attrgetter
df.map(attrgetter('key:col2'))
IIRC, it performs slightly better than lambda
in some situations. This is probably more pronounced in this case than usual because it can avoid the global getattr
name lookup, and in this case, I think it looks a bit nicer too.
Upvotes: 1