Paul
Paul

Reputation: 33

Save multiple values in table

Given:

I read every column from the dataframe, and call the function with the column as parameter.

The output should be saved as a table. How can I achieve this?

Upvotes: 0

Views: 292

Answers (2)

pasha701
pasha701

Reputation: 7207

If function return values of the same type, in Scala:

// functions
val mySplit = (value: String) => Array(value.split(","))
val mySplitUDF = udf(mySplit(_: String))

// data
val intialDF = sparkContext.parallelize(List("First,Second,Third")).toDF("initialColumn")

// transformations
val arrayDF = intialDF.select(mySplitUDF(col("initialColumn")).as("arrayColumn"))
val expodedDF = arrayDF.select(explode(col("arrayColumn")).as("explodedCol"))

val resultDF = expodedDF.select(
  col("explodedCol").getItem(0).as("Col1"),
  col("explodedCol").getItem(1).as("Col2"),
  col("explodedCol").getItem(2).as("Col3")
)
resultDF.show(false)

Result is:

+-----+------+-----+
|Col1 |Col2  |Col3 |
+-----+------+-----+
|First|Second|Third|
+-----+------+-----+

On Python can be implemented in similar way

Upvotes: 1

Pushkr
Pushkr

Reputation: 3619

from pyspark.sql import Row
df = sc.parallelize(['a','b','c']).map(lambda row : Row(key=row)).toDF() 
df.show()

:

+---+
|key|
+---+
|  a|
|  b|
|  c|
+---+

:

def func (args):
    # function that will return 5 multiple values
    lista = Row(result=",".join([ args.key+str(x) for x in range(5)]))
    return lista

new_table = df.rdd.map(func).toDF() 
new_table.show()

:

+--------------+
|        result|
+--------------+
|a0,a1,a2,a3,a4|
|b0,b1,b2,b3,b4|
|c0,c1,c2,c3,c4|
+--------------+

:

new_table.saveAsTable("results")

Upvotes: 1

Related Questions