Reputation: 373
How should I be able to use a variable inside a lambda function ?
for a_name in name_field_names:
results = sqlContext.sql("SELECT * FROM noise_data")
stringsDS = results.map(lambda p:p.(a_name))
The lambda function is expecting me to give the name of the column, whereas I am giving a variable.
How should I pass the value of the a_name variable to the lambda function ?
Upvotes: 0
Views: 899
Reputation: 330073
To get a variable from Row
by name use bracket notation:
from pyspark.sql import Row
row = Row(a = "foo", b = "bar")
row["a"]
'foo'
or getattr
:
getattr(row, "b")
'bar'
You can also skip map
and use select
:
sqlContext.sql("SELECT * FROM noise_data").select(a_name)
Also remember that Python late bindings. Using variable from the closure inside a function called in a loop is not a good idea. If you want map
you should rather capture a_name
as an attribute, for example:
from operator import attrgetter
for a_name in name_field_names:
results = ...
results.rdd.map(attrgetter(a_name)))
Upvotes: 1