Reputation: 165
I am trying to create a for loop i which I first: filter a pyspark sql dataframe, then transform the filtered dataframe to pandas, apply a function to it and yied the result in a list called results. My list contains a sequence of strings (that will be sort of ids in the dataframe); I want the for loop to, in each iteration, obtain one of the strings from the list, and filter all the rows in the dataframe whose id is that string. Sample code:
results = []
for x in list:
aux = df.filter("id='x'")
final= function(aux,"value")
results.append(final)
results
The dataframe is a time-series, and outside the loop I apply the aux = df.filter("id='x'")
transformation and then the function runs without problem; the issue is in the loop itself.
However, when I do aux.show() it shows an empty dataframe. The dataframe is a time-series, and outside the loop I apply the aux = df.filter("id='x'")
transformation and then the function runs without problem; the issue is in the loop itself.
Does anyone know why this may be happening?
Upvotes: 1
Views: 2042
Reputation: 42352
Try the code below. x
is not substituted in the filter expression.
results = []
for x in list:
aux = df.filter("id = '%s'" % x)
final= function(aux,"value")
results.append(final)
results
Upvotes: 1