SuWon
SuWon

Reputation: 23

Array of String to Array of Struct in Scala + Spark

I am currently using Spark and Scala 2.11.8

I have the following schema:

root
|-- partnumber: string (nullable = true)
|-- brandlabel: string (nullable = true)
|-- availabledate: string (nullable = true)
|-- descriptions: array (nullable = true)
|-- |--   element: string (containsNull = true) 

I am trying to use UDF to convert it to the following:

root
|-- partnumber: string (nullable = true)
|-- brandlabel: string (nullable = true)
|-- availabledate: string (nullable = true)
|-- description: array (nullable = true)
|    |-- element: struct (containsNull = true)
|    |    |-- value: string (nullable = true)
|    |    |-- code: string (nullable = true)
|    |    |-- cost: int(nullable = true)

So source data looks like this:

[WrappedArray(a abc 100,b abc 300)]
[WrappedArray(c abc 400)]

I need to use " " (space) as a delimiter, but don't know how to do this in scala.

def convert(product: Seq[String]): Seq[Row] = {
    ??/
}

I am fairly new in Scala, so can someone guide me how to construct this type of function?

Thanks.

Upvotes: 0

Views: 3838

Answers (1)

thopaw
thopaw

Reputation: 4044

I do not know if I understand your problem right, but map could be your friend.

case class Row(a: String, b: String, c: Int)
val value = List(List("a", "abc", 123), List("b", "bcd", 321))

value map {
    case List(a: String, b: String, c: Int) => Row(a,b,c);
}

if you have to parse it first:

val value2 = List("a b 123", "c d 345")
value2 map {
    case s => { 
        val split = s.toString.split(" ")
        Row(split(0), split(1), split(2).toInt)
    }
}

Upvotes: 2

Related Questions