Reputation: 1079
I have a simple piece of code here:
query = """
select id, date, type from schema.camps
"""
df = spark.sql(query)
I get an error that says:
> > "cannot resolve '`id`' given input columns:
> > [ecs_snapshot, ecs_version, ecs_bundle_type]; line 2
File
> > "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line
> > 767, in sql
> > return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped) File
> > "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py",
> > line 1257, in call
> > answer, self.gateway_client, self.target_id, self.name) File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py",
> line 69,
> > in deco
> > raise AnalysisException(s.split(': ', 1)[1], stackTrace) pyspark.sql.utils.AnalysisException: "cannot resolve
> > 'id
' given input columns: [ecs_snapshot,
> > ecs_version, ecs_bundle_type]; line 2 pos 11;"
Tried everything I could based on the solutions provided. Funny part is I have another query on another table that works just fine. Would appreciate any help regarding this. Thanks in advance.
Here is the schema of the table:
camps(
id numeric(38,0) NOT NULL encode raw,
name varchar(765) NULL encode zstd,
type varchar(765) NULL encode zstd,
YYYY varchar(765) NULL encode zstd,
ZZZZ varchar(765) NULL encode zstd,
LLLL varchar(765) NULL encode zstd,
MMMM numeric(38,0) NULL encode zstd,
NNNN varchar(765) NULL encode zstd,
date timestamp without time zone NULL encode zstd,
PPPP numeric(38,0) NULL encode az64,
PRIMARY KEY (marketplace_id, campaign_id)
)
;
Upvotes: 1
Views: 14618
Reputation: 1214
Please, try run code and show result.
import spark.implicits._
val df1 = spark.table("ads.dim_campaigns")
df1.printSchema()
// Please, show result
val df2 = df1.select(
'campaign_id,
'external_id,
'start_date,
'program_type,
'advertiser_id
)
df2.printSchema()
// please, show result
Upvotes: 0
Reputation: 1353
The camp.campaign_id
column does not exist in the table ads.dim_campaigns
This query works
>>> l = [[1],[2],[3]]
>>> df = spark.createDataFrame(l,['col_1'])
>>> df.createOrReplaceTempView('table')
>>> query = """SELECT table_alias.col_1 FROM table table_alias"""
>>> spark.sql(query).show()
+-----+
|col_1|
+-----+
| 1|
| 2|
| 3|
+-----+
This query gives the same error as yours (please see the col_x
instead of col_1
)
>>> l = [[1],[2],[3]]
>>> df = spark.createDataFrame(l,['col_1'])
>>> df.createOrReplaceTempView('table')
>>> query = """SELECT table_alias.col_x FROM table table_alias"""
>>> spark.sql(query).show()
/.../
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/Cellar/apache-spark/2.4.5/libexec/python/pyspark/sql/session.py", line 767, in sql
return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
File "/usr/local/Cellar/apache-spark/2.4.5/libexec/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/usr/local/Cellar/apache-spark/2.4.5/libexec/python/pyspark/sql/utils.py", line 69, in deco
raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: "cannot resolve '`table_alias.col_x`' given input columns: [table_alias.col_1];
Upvotes: 1