Reputation: 835
On input I have DF similar to:
+-----+-----+
|data1|data2|
+-----+-----+
| 1.0| 0.33|
| 1.0| 0|
| 2.0| 0.33|
| 1.0| 0|
| 1.0| 0.33|
| 2.0| 0.33|
+-----+-----+
after performing pivot
pivot = df.groupBy('data1').pivot('data2').count()
structure looks like this:
+-----+----+----+
|data1| 0|0.33|
+-----+----+----+
| 1.0| 2| 2|
| 2.0|null| 2|
+-----+----+----+
Attempting to do anything with column 0.33 results in
AnalysisException: Can't extract value from 0#1535L;
How to handle this case?
Upvotes: 0
Views: 3364
Reputation: 2108
The problem is that your column name contains a dot. As you can see here:
The Spark SQL doesn’t support field names that contains dots
Solution 1
Rename columns with new names (new names have to be without dots):
There are many ways to do this, see this SO question, here I have put an example from that question:
>>> oldColumns = pivot.schema.names
>>> newColumns = ["data1","col1","col2"]
>>> newPivot = reduce(lambda data, idx: data.withColumnRenamed(oldColumns[idx], newColumns[idx]), xrange(len(oldColumns)), pivot)
>>> newPivot.show()
+-----+----+----+
|data1|col1|col2|
+-----+----+----+
| 1.0| 2| 2|
| 2.0|null| 2|
+-----+----+----+
Solution 2
Use backquote ( ` ) to select the column that have dots in its name (here an example):
>>> newPivot = pivot.groupBy().sum("`0.33`")
>>> newPivot.show()
+---------+
|sum(0.33)|
+---------+
| 4|
+---------+
Upvotes: 3