user1860447
user1860447

Reputation: 1436

AnalysisException: cannot resolve given input columns:

I am running in to this error when I am trying to select a couple of columns from the temporary table.

pd_df = pd.read_sql('select * from abc.cars limit 10', conn)

df = spark.createDataFrame(pd_df)
df.createOrReplaceTempView("cars_tmp")
df.show()

print('***************')
print("Reading from tmp table")
data = spark.sql('select location_id from cars_tmp')
data.show()


AnalysisException: cannot resolve '`location_id`' given input columns: [cars_tmp.abc.product_id, cars_tmp.abc.location_id ...]

When I select all the columns I get the results. So this is successful:

data = spark.sql('select * from cars_tmp')
data.show()

I tried below queries but they fail as well with the same error:

data = spark.sql('select cars_tmp.abc.location_id from cars_tmp')
data.show()

data = spark.sql('select cars_tmp.location_id from cars_tmp')
data.show()

data = spark.sql('select abc.location_id from cars_tmp')
data.show()

I am running these in datbricks. Databricks runtime version: 7.0 Apache Spark version: 3.0 scala: 2.12

or "spark_version": "7.0.x-scala2.12",

Any help will be highly appreciated.

Thanks

Upvotes: 0

Views: 24006

Answers (2)

user1860447
user1860447

Reputation: 1436

I resolved the issue by add each column in the panda select query. So something like this:

pd_df = pd.read_sql('select id, location_id, product_id from abc.cars limit 10', conn)

Upvotes: 0

Grzegorz
Grzegorz

Reputation: 1363

The column name does not exist in the table. select * from cars_tmp works because you do not specify the column name.

Please see this answer https://stackoverflow.com/a/64042756/8913402 with the same error handling.

Upvotes: 1

Related Questions