Reputation: 258
Is it possible to extract all of the rows of a specific column to a container of type array?
I want to be able to extract it and then reshape it as an array. Currently, the column type that I am trying to extract is of type udt.
I tried to use
my_array = df.select(df['my_col'])
but this is not correct as it gives me a list
Upvotes: 1
Views: 3275
Reputation: 444
collect_list() gives you an array of values.
A. If you want to collect all the values of a column say c2, based on another column say c1, you can group by c1 and collect values of c2 using collect_list.
df = spark.createDataFrame([
('emma', 'math'),
('emma', 'english'),
('mia','english'),
('mia','science'),
('mona','math'),
('mona','geography')
], ["student", "subject"])
from pyspark.sql.functions import collect_list
df1=df.groupBy('student').agg(collect_list('subject'))
df1.show()
B. If you want all values of c2 irrespective of any other column, you can group by a literal:
from pyspark.sql.functions import lit
df1=df.groupBy(lit(1)).agg(collect_list('subject'))
df1.show()
Upvotes: 2