Reputation: 1585
I am trying to build a list comprehension that has an iteration built into it. however, I have not been able to get this to work. What am I doing wrong?
Here is a trivial representation of what I am trying to do.
dataframe columns = ["code_number_1", "code_number_2", "code_number_3", "code_number_4", "code_number_5", "code_number_6", "code_number_7", "code_number_8",
cols = [0,3,4]
result = df.select([code_number_{f"{x}" for x in cols])
Addendum:
my ultimate goal is to do something like this:
col_buckets ["code_1", "code_2", "code_3"]
amt_buckets = ["code_1_amt", "code_2_amt", "code_3_amt" ]
result = df.withColumn("max_amt_{col_index}", max(df.select(max(**amt_buckets**) for col_indices of amt_buckets if ***any of col indices of col_buckets*** =='01')))
Upvotes: 4
Views: 10281
Reputation: 31490
[code_number_{f"{x}" for x in cols]
not a valid list comprehension syntax.
Instead try with ["code_number_"+str(x) for x in cols]
generates list of column names ['code_number_0', 'code_number_3', 'code_number_4']
.
.select
accepts strings/columns
as arguments to select the matching fields from dataframe.
Example:
df=spark.createDataFrame([("a","b","c","d","e")],["code_number_0","code_number_1","code_number_2","code_number_3","code_number_4"])
cols = [0,3,4]
#passing strings to select
result = df.select(["code_number_"+str(x) for x in cols])
#or passing columns to select
result = df.select([col("code_number_"+str(x)) for x in cols]).show()
result.show()
#+-------------+-------------+-------------+
#|code_number_0|code_number_3|code_number_4|
#+-------------+-------------+-------------+
#| a| d| e|
#+-------------+-------------+-------------+
Upvotes: 4