Subhransu Nanda
Subhransu Nanda

Reputation: 319

AnalysisException: Can't extract value from place#14: need struct type but got double

I'm trying to find missing and null values from my dataframe but I'm getting an exception. I have included only the initial few schema below:

root
|-- created_at: string (nullable = true)
|-- id: long (nullable = true)
|-- id_str: string (nullable = true)
|-- text: string (nullable = true)
|-- display_text_range: string (nullable = true)
|-- source: string (nullable = true)
|-- truncated: boolean (nullable = true)
|-- in_reply_to_status_id: double (nullable = true)
|-- in_reply_to_status_id_str: string (nullable = true)
|-- in_reply_to_user_id: double (nullable = true)
|-- in_reply_to_user_id_str: string (nullable = true)
|-- in_reply_to_screen_name: string (nullable = true)
|-- geo: double (nullable = true)
|-- coordinates: double (nullable = true)
|-- place: double (nullable = true)
|-- contributors: string (nullable = true)

Here is the code which throws the exception. I'm trying to find missing and null values here.

df_mis = df.select([count(when(isnan(c) | col(c).isNull(), c)).alias(c) for c in df.columns])
df_mis.show()

Here is the AnalysisException details:

---------------------------------------------------------------------------
AnalysisException                         Traceback (most recent call last)
<ipython-input-20-6ccaacbbcc7f> in <module>()
----> 1 df_mis = df.select([count(when(isnan(c) | col(c).isNull(), c)).alias(c) for c in df.columns])
      2 df_mis.show()

2 frames
/content/spark-3.2.0-bin-hadoop3.2/python/pyspark/sql/dataframe.py in select(self, *cols)
   1683         [Row(name='Alice', age=12), Row(name='Bob', age=15)]
   1684         """
-> 1685         jdf = self._jdf.select(self._jcols(*cols))
   1686         return DataFrame(jdf, self.sql_ctx)
   1687 

/content/spark-3.2.0-bin-hadoop3.2/python/lib/py4j-0.10.9.2-src.zip/py4j/java_gateway.py in __call__(self, *args)
   1308         answer = self.gateway_client.send_command(command)
   1309         return_value = get_return_value(
-> 1310             answer, self.gateway_client, self.target_id, self.name)
   1311 
   1312         for temp_arg in temp_args:

/content/spark-3.2.0-bin-hadoop3.2/python/pyspark/sql/utils.py in deco(*a, **kw)
    115                 # Hide where the exception came from that shows a non-Pythonic
    116                 # JVM exception message.
--> 117                 raise converted from None
    118             else:
    119                 raise

AnalysisException: Can't extract value from place#14: need struct type but got double

Upvotes: 3

Views: 21179

Answers (1)

Subhransu Nanda
Subhransu Nanda

Reputation: 319

I solved this issue by replacing dots "." in column names with underscores. I found the following stackoverflow post to be very helpful. To quote from the post, "The error is there because (.)dot is used to access a struct field".

Extracting value from data frame thorws error because of the . in the column name in spark

Upvotes: 7

Related Questions