Convert array of rows into array of strings in pyspark

Question

I have a dataframe with 2 columns and I got below array by doing df.collect().

array = [Row(name=u'Alice', age=10), Row(name=u'Bob', age=15)]

Now I want to get an output array like below.

new_array = ['Alice', 'Bob']

Could anyone please let me know how to extract above output using pyspark. Any help would be appreciated.

Thanks

cph_sto · Accepted Answer

# Creating the base dataframe.
values = [('Alice',10),('Bob',15)]
df = sqlContext.createDataFrame(values,['name','age'])
df.show()
    +-----+---+
    | name|age|
    +-----+---+
    |Alice| 10|
    |  Bob| 15|
    +-----+---+

df.collect()
    [Row(name='Alice', age=10), Row(name='Bob', age=15)]

# Use list comprehensions to create a list.
new_list = [row.name for row in df.collect()]
print(new_list)
    ['Alice', 'Bob']

Convert array of rows into array of strings in pyspark

Answers (2)

Related Questions