Reputation: 1344
I'm not sure if I'm in the correct group for this question. Any I have created the following sql code in Databricks, however I'm getting the error message;
Error in SQL statement: AnalysisException: cannot resolve '
a.COUNTRY_ID
' given input columns: [a."PK_LOYALTYACCOUNT";"COUNTRY_ID";"CDC_TYPE", b."PK_LOYALTYACCOUNT";"COUNTRY_ID";"CDC_TYPE"]; line 7 pos 7;
I know the code works as I have successfully run the code on my SQL Server The code is as follows:
tabled = spark.read.csv("adl://carlslake.azuredatalakestore.net/testfolder/dbo_tabled.csv",inferSchema=True,header=True)
tablee = spark.read.csv("adl://carlslake.azuredatalakestore.net/testfolder/dbo_tablee.csv",inferSchema=True,header=True)
tabled.createOrReplaceTempView('tabled')
tablee.createOrReplaceTempView('tablee')
%sql
; with cmn as
( SELECT a.CDC_TYPE,
a. PK_LOYALTYACCOUNT, --Add these also in CTE result set
a.COUNTRY_ID --Add these also in CTE result set
FROM tabled a
INNER JOIN tablee b
ON a.COUNTRY_ID = b.COUNTRY_ID
AND a.PK_LOYALTYACCOUNT = b.PK_LOYALTYACCOUNT
AND a.CDC_TYPE = 'U'
)
SELECT 1 AS is_deleted,
a.*
FROM tabled a
INNER JOIN cmn
ON a.CDC_TYPE = cmn.CDC_TYPE
and a.COUNTRY_ID = cmn.COUNTRY_ID
AND a.PK_LOYALTYACCOUNT = cmn.PK_LOYALTYACCOUNT
UNION ALL
SELECT 0 AS is_deleted,
b.*
FROM tablee b
INNER JOIN cmn
ON b.CDC_TYPE = cmn.CDC_TYPE
and b.COUNTRY_ID = cmn.COUNTRY_ID
AND b.PK_LOYALTYACCOUNT = cmn.PK_LOYALTYACCOUNT
UNION ALL
SELECT NULL,
a.*
FROM tabled a
WHERE a.CDC_TYPE = 'N'
UNION ALL
SELECT NULL,
b.*
FROM tablee b
WHERE b.CDC_TYPE = 'N'
when I run the simple query...
example1 =
spark.sql("""select * from tablee""")
or example2 =
spark.sql("""select * from tabled""")
I get the following output, so I know the tables are there
Any suggestions will be well received.
Upvotes: 1
Views: 7775
Reputation: 64
Use semicolon delimiter while reading from csv
tabled = spark.read.option("delimiter", ";").csv("adl://carlslake.azuredatalakestore.net/testfolder/dbo_tabled.csv",inferSchema=True,header=True)
or
tabled = spark.read.load("adl://carlslake.azuredatalakestore.net/testfolder/dbo_tabled.csv",
format="csv", sep=";", inferSchema="true", header="true")
ref: https://spark.apache.org/docs/2.3.0/sql-programming-guide.html#manually-specifying-options
Upvotes: 1
Reputation: 1344
The columns were not being identified properly since the delimiter used was a semicolon(;) and the job was looking for commas. Problem solved
Upvotes: 0