Digvijay Sawant
Digvijay Sawant

Reputation: 1079

Cannot resolve 'column_name' given input columns: SparkSQL

I have a simple piece of code here:

query = """
    select id, date, type from schema.camps
"""
df = spark.sql(query)

I get an error that says:

>     > "cannot resolve '`id`' given input columns:
>     > [ecs_snapshot, ecs_version, ecs_bundle_type]; line 2   

File > > "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line > > 767, in sql > > return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped) File > > "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", > > line 1257, in call > > answer, self.gateway_client, self.target_id, self.name) File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", > line 69, > > in deco > > raise AnalysisException(s.split(': ', 1)[1], stackTrace) pyspark.sql.utils.AnalysisException: "cannot resolve > > 'id' given input columns: [ecs_snapshot, > > ecs_version, ecs_bundle_type]; line 2 pos 11;"

Tried everything I could based on the solutions provided. Funny part is I have another query on another table that works just fine. Would appreciate any help regarding this. Thanks in advance.

Here is the schema of the table:

camps(
    
    id numeric(38,0) NOT NULL encode raw,
    name varchar(765) NULL encode zstd,
    type varchar(765) NULL encode zstd,
    YYYY varchar(765) NULL encode zstd,
    ZZZZ varchar(765) NULL encode zstd,
    LLLL varchar(765) NULL encode zstd,
    MMMM numeric(38,0) NULL encode zstd,
    NNNN  varchar(765) NULL encode zstd,
    date timestamp without time zone NULL encode zstd,
    PPPP numeric(38,0) NULL encode az64,
    PRIMARY KEY (marketplace_id, campaign_id)
)

;

Upvotes: 1

Views: 14618

Answers (2)

mvasyliv
mvasyliv

Reputation: 1214

Please, try run code and show result.

import spark.implicits._

val df1 = spark.table("ads.dim_campaigns")
df1.printSchema()
// Please, show result

val df2 = df1.select(
  'campaign_id,
  'external_id,
  'start_date,
  'program_type,
  'advertiser_id
)
df2.printSchema()
// please, show result

Upvotes: 0

Grzegorz
Grzegorz

Reputation: 1353

The camp.campaign_id column does not exist in the table ads.dim_campaigns

This query works

>>> l = [[1],[2],[3]]
>>> df = spark.createDataFrame(l,['col_1'])
>>> df.createOrReplaceTempView('table')
>>> query = """SELECT table_alias.col_1 FROM table table_alias"""
>>> spark.sql(query).show()
+-----+
|col_1|
+-----+
|    1|
|    2|
|    3|
+-----+

This query gives the same error as yours (please see the col_x instead of col_1)

>>> l = [[1],[2],[3]]
>>> df = spark.createDataFrame(l,['col_1'])
>>> df.createOrReplaceTempView('table')
>>> query = """SELECT table_alias.col_x FROM table table_alias"""
>>> spark.sql(query).show()

/.../
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/Cellar/apache-spark/2.4.5/libexec/python/pyspark/sql/session.py", line 767, in sql
    return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
  File "/usr/local/Cellar/apache-spark/2.4.5/libexec/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/usr/local/Cellar/apache-spark/2.4.5/libexec/python/pyspark/sql/utils.py", line 69, in deco
    raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: "cannot resolve '`table_alias.col_x`' given input columns: [table_alias.col_1];

Upvotes: 1

Related Questions