Surender Raja
Surender Raja

Reputation: 3609

In Spark How do i read a field by its name itself instead by its index

I use Spark 1.3.

My data has 50 and more attributes and hence I went for a custom class.

How do I access a Field from a Custom Class by its name not by its position

Here every time I need to invoke a method productElement(0)

Also i am not supposed to use case class , Hence i am using a Custom class for schema.

 class OnlineEvents(gsm_id:String,
          attribution_id:String,
          event_date:String,
          event_timestamp:String,
          event_type:String
          ) extends Product {

  override def productElement(n: Int): Any = n match {
  case 0 => impression_id
  case 1 => attribution_id
  case 2 => event_date
  case 3 => event_timestamp
  case 4 => event_type

  case _ => throw new IndexOutOfBoundsException(n.toString)
 }

  override def productArity: Int = 5

  override def canEqual(that: Any): Boolean = that.isInstanceOf[OnlineEvents]

 }

My Spark Code :

  val onlineRDD = sc.textFile("/user/cloudera/input_files/online_events.txt")

  val schemaRDD = onlineRDD.map(record => {
                                         val arr: Array[String] = record.split(",")
                                          new OnlineEvents(arr(0),arr(1),arr(2),arr(3),arr(4))
})
 val keyvalueRDD =  schemaRDD .map(online => ((online.productElement(0).toString,online.productElement(4).toString),online))

If i try to access any field from OnlineEvents then i need to use productElement() .(i.e online.productElement(0) for gsm_id )

Can i directly access the field as online.gsm_id ... online.event_type , so that my code is easily readable

How do i directly access a field by its name when i use Custom Class for schema?

Upvotes: 0

Views: 70

Answers (2)

Jacek Laskowski
Jacek Laskowski

Reputation: 74749

I strongly recommend using a case class per use case (which all together cover all the use cases that use the data).

A single use case would then be a single case class that would save you a lot of thinking about how to maintain the 50+ fields.

Yeah, you'd "trade" a single big 50-or-more-field class for 10 5-field case classes, but given how easy it is to create a case class and how nicely they would describe your data I think it's worth the hassle.

Upvotes: 0

Ramesh Maharjan
Ramesh Maharjan

Reputation: 41987

According to my understanding of your question, you need to define some functions inside OnlineEvents to return the types. So your solution should be

class OnlineEvents(gsm_id:String,
                   attribution_id:String,
                   event_date:String,
                   event_timestamp:String,
                   event_type:String
                  ) extends Product {
  def get_gsm_id(): String ={
    gsm_id
  }

  def get_attribution_id(): String ={
    attribution_id
  }

  def get_event_date(): String ={
    event_date
  }

  def get_event_timestamp(): String ={
    event_timestamp
  }

  def get_event_type(): String ={
    event_type
  }

  override def productElement(n: Int): Any = n match {
    case 0 => gsm_id
    case 1 => attribution_id
    case 2 => event_date
    case 3 => event_timestamp
    case 4 => event_type

    case _ => throw new IndexOutOfBoundsException(n.toString)
  }

  override def productArity: Int = 5

  override def canEqual(that: Any): Boolean = that.isInstanceOf[OnlineEvents]

}

And call the funtions as below

val keyvalueRDD =  schemaRDD .map(online => ((online.get_gsm_id().toString,online.get_event_type().toString),online))

Upvotes: 1

Related Questions