rmahesh
rmahesh

Reputation: 749

Error with query in pandasql

I am very new to PandaSQL and have never used it before. Here is my code up until now:

import pandas as pd
from pandasql import sqldf
import numpy as np

tasks = pd.read_csv("C:/Users/RMahesh/Documents/TASKS_Final_2.csv", encoding='cp1252')
query = """SELECT Work Item Id, Parent Work Item Id, MAX(Remaining Work) 
FROM TASKS 
GROUP BY Work Item Id, Parent Work Item Id;"""

df = sqldf(query, locals()))
print(df.head(5))

I am getting this error:

'pandasql.sqldf.PandaSQLException: (sqlite3.OperationalError) near "Id": syntax error [SQL: 'SELECT Work Item Id, Parent Work Item Id, MAX(Remaining Work) \n'

Any help would be great!

Edit: After implementing some suggestions from other users below, here is my working code:

import pandas as pd
from pandasql import sqldf
import numpy as np
tasks = pd.read_csv("C:/Users/RMahesh/Documents/TASKS_Final_2.csv", encoding='cp1252',  low_memory=False)

query = """SELECT [Work Item Id], [Parent Work Item Id], MAX([Remaining Work]) 
FROM tasks 
GROUP BY [Work Item Id], [Parent Work Item Id];"""

print(sqldf(query, locals()))

Upvotes: 0

Views: 1560

Answers (1)

zwer
zwer

Reputation: 25799

If you have column names that contain spaces, you have to quote them to make the SQL valid:

query = """SELECT `Work Item Id`, `Parent Work Item Id`, MAX(`Remaining Work`) 
FROM TASKS 
GROUP BY `Work Item Id`, `Parent Work Item Id`;"""

or

query = """SELECT [Work Item Id], [Parent Work Item Id], MAX([Remaining Work]) 
FROM TASKS 
GROUP BY [Work Item Id], [Parent Work Item Id];"""

In dependence of what flavor PandaSQL expects.

Upvotes: 2

Related Questions