Reputation: 11
strong textBelow was my code block:
conll_data.select(F.explode(F.arrays_zip('token.result','label.result')).alias("cols")) \
.select(F.expr("cols['0']").alias("token"),
F.expr("cols['1']").alias("ground_truth"))\
.groupBy('ground_truth')\
.count()\
.orderBy('count', ascending=False)\
.show(100,truncate=False)
and am getting below error:
AnalysisException: No such struct field 0 in result, result
and my requirements.txt is like below:
jupyterlab
SQLAlchemy==0.7.1
spark-nlp==3.4.4
pyspark==3.1.2
numpy== 1.19.2
pandas==1.3.2
openpyxl==3.0.9
jupyter_contrib_nbextensions
spark-nlp-display
pyarrow==3.0.0
streamlit==1.1.0
scipy==1.7.3
Tensorflow==2.5.0
tensorflow-addons
python==3.7.4
Upvotes: 0
Views: 176
Reputation: 5135
I ran into this issue the other day. Zip actually uses columns that are named 1 and 2. This is an issues as these are reserved words. (Numbers) so you need to escape the column name so that SQL knows to use it as a column name that is reserved. This is done but using ` so the columns names can be referred to as : `1` and `2`. You need to refer to them as columns not as an array[1]
.
conll_data.select(F.explode(F.arrays_zip('token.result','label.result')).alias("cols")) \
.select(F.col("cols.`0`").alias("token"),
F.col("cols.`1`").alias("ground_truth"))\
.groupBy('ground_truth')\
.count()\
.orderBy('count', ascending=False)\
.show(100,truncate=False)
#minimal example to show this works:
df.select(arrays_zip( array_repeat(lit(0), 10), array_repeat(lit(1), 10)).alias("cols")).select("cols.`1`").show()
+--------------------+
| 1|
+--------------------+
|[1, 1, 1, 1, 1, 1...|
|[1, 1, 1, 1, 1, 1...|
|[1, 1, 1, 1, 1, 1...|
|[1, 1, 1, 1, 1, 1...|
|[1, 1, 1, 1, 1, 1...|
|[1, 1, 1, 1, 1, 1...|
|[1, 1, 1, 1, 1, 1...|
|[1, 1, 1, 1, 1, 1...|
|[1, 1, 1, 1, 1, 1...|
|[1, 1, 1, 1, 1, 1...|
|[1, 1, 1, 1, 1, 1...|
|[1, 1, 1, 1, 1, 1...|
+--------------------+
Upvotes: 1