Valli69
Valli69

Reputation: 9892

Convert array of rows into array of strings in pyspark

I have a dataframe with 2 columns and I got below array by doing df.collect().

array = [Row(name=u'Alice', age=10), Row(name=u'Bob', age=15)]

Now I want to get an output array like below.

new_array = ['Alice', 'Bob']

Could anyone please let me know how to extract above output using pyspark. Any help would be appreciated.

Thanks

Upvotes: 1

Views: 7190

Answers (2)

cph_sto
cph_sto

Reputation: 7585

# Creating the base dataframe.
values = [('Alice',10),('Bob',15)]
df = sqlContext.createDataFrame(values,['name','age'])
df.show()
    +-----+---+
    | name|age|
    +-----+---+
    |Alice| 10|
    |  Bob| 15|
    +-----+---+

df.collect()
    [Row(name='Alice', age=10), Row(name='Bob', age=15)]

# Use list comprehensions to create a list.
new_list = [row.name for row in df.collect()]
print(new_list)
    ['Alice', 'Bob']

Upvotes: 3

Jim Todd
Jim Todd

Reputation: 1588

I see two columns name and age in the df. Now, you want only the name column to be displayed.

You can select it like:

df.select("name").show()

This will show you only the names.

Tip: Also, you df.show() instead of df.collect(). That will show you in tabular form instead of row(...)

Upvotes: 0

Related Questions