Reputation: 3609
I use Spark 1.3.
My data has 50 and more attributes and hence I went for a custom class.
How do I access a Field from a Custom Class by its name not by its position
Here every time I need to invoke a method productElement(0)
Also i am not supposed to use case class , Hence i am using a Custom class for schema.
class OnlineEvents(gsm_id:String,
attribution_id:String,
event_date:String,
event_timestamp:String,
event_type:String
) extends Product {
override def productElement(n: Int): Any = n match {
case 0 => impression_id
case 1 => attribution_id
case 2 => event_date
case 3 => event_timestamp
case 4 => event_type
case _ => throw new IndexOutOfBoundsException(n.toString)
}
override def productArity: Int = 5
override def canEqual(that: Any): Boolean = that.isInstanceOf[OnlineEvents]
}
My Spark Code :
val onlineRDD = sc.textFile("/user/cloudera/input_files/online_events.txt")
val schemaRDD = onlineRDD.map(record => {
val arr: Array[String] = record.split(",")
new OnlineEvents(arr(0),arr(1),arr(2),arr(3),arr(4))
})
val keyvalueRDD = schemaRDD .map(online => ((online.productElement(0).toString,online.productElement(4).toString),online))
If i try to access any field from OnlineEvents then i need to use productElement() .(i.e online.productElement(0) for gsm_id )
Can i directly access the field as online.gsm_id ... online.event_type , so that my code is easily readable
How do i directly access a field by its name when i use Custom Class for schema?
Upvotes: 0
Views: 70
Reputation: 74749
I strongly recommend using a case class per use case (which all together cover all the use cases that use the data).
A single use case would then be a single case class that would save you a lot of thinking about how to maintain the 50+ fields.
Yeah, you'd "trade" a single big 50-or-more-field class for 10 5-field case classes, but given how easy it is to create a case class and how nicely they would describe your data I think it's worth the hassle.
Upvotes: 0
Reputation: 41987
According to my understanding of your question, you need to define some functions
inside OnlineEvents
to return the types. So your solution should be
class OnlineEvents(gsm_id:String,
attribution_id:String,
event_date:String,
event_timestamp:String,
event_type:String
) extends Product {
def get_gsm_id(): String ={
gsm_id
}
def get_attribution_id(): String ={
attribution_id
}
def get_event_date(): String ={
event_date
}
def get_event_timestamp(): String ={
event_timestamp
}
def get_event_type(): String ={
event_type
}
override def productElement(n: Int): Any = n match {
case 0 => gsm_id
case 1 => attribution_id
case 2 => event_date
case 3 => event_timestamp
case 4 => event_type
case _ => throw new IndexOutOfBoundsException(n.toString)
}
override def productArity: Int = 5
override def canEqual(that: Any): Boolean = that.isInstanceOf[OnlineEvents]
}
And call the funtions as below
val keyvalueRDD = schemaRDD .map(online => ((online.get_gsm_id().toString,online.get_event_type().toString),online))
Upvotes: 1