Reputation: 275
I have the following schema -
[name: StringType, grades: ArrayType( StructType( StructField(subject_grades, ArrayType(StructType(StructField(subject,StringType,false), StructField(grade,LongType,false)]
I want to groupby
on the subject field inside the subject_grades
array which is inside the grades array.
I tried
sql.sql("select ... from grades_table group by grades.subject_grades.subject")
but I get
org.apache.spark.sql.AnalysisException: cannot resolve 'grades.subject_grades[subject]' due to data type mismatch: argument 2 requires integral type, however, 'subject' is of string type.;
I understand why I get this error, however I was hoping I could avoid exploding the entire thing in order to group by on the inner field.
Upvotes: 1
Views: 496
Reputation: 74669
Arrays are (relatively) hard to work with and beg for explode
(or flatMap
) to work with them when the main query requires the elements inside, e.g. for grouping.
Something I learnt from the question is that the following clause with subject_grades
being of type ArrayType
is translated to a clause with subject
being the index and hence the requirement of integral type.
group by grades.subject_grades.subject
I'd see no other way but using explode
(or flatMap
) to "destructure" the subject_grades
array and do the grouping.
Upvotes: 1