Reputation: 11
As per the documentation of Pandas Query Engine, the code sets only allow for one df to be connected. I would like to connect to multiple dfs. This works on PandasAI through SmartDataLake, but i prefer the descriptive answers given by Pandas Query Engine as a result of cycling the result through the LLM again. Any way to make it work?
Documentation code:
df = pd.read_csv("./titanic_train.csv") #Only 1 dataframe
instruction_str = (
"1. Convert the query to executable Python code using Pandas.\n"
"2. The final line of code should be a Python expression that can be called with the `eval()` function.\n"
"3. The code should represent a solution to the query.\n"
"4. PRINT ONLY THE EXPRESSION.\n"
"5. Do not quote the expression.\n"
)
pandas_prompt_str = (
"You are working with a pandas dataframe in Python.\n"
"The name of the dataframe is `df`.\n"
"This is the result of `print(df.head())`:\n"
"{df_str}\n\n"
"Follow these instructions:\n"
"{instruction_str}\n"
"Query: {query_str}\n\n"
"Expression:"
)
response_synthesis_prompt_str = (
"Given an input question, synthesize a response from the query results.\n"
"Query: {query_str}\n\n"
"Pandas Instructions (optional):\n{pandas_instructions}\n\n"
"Pandas Output: {pandas_output}\n\n"
"Response: "
)
pandas_prompt = PromptTemplate(pandas_prompt_str).partial_format(
instruction_str=instruction_str, df_str=df.head(5)
)
Trying the below code for multiple data frames
instruction_str = (
"1. Convert the query to executable Python code using Pandas.\n"
"2. The final line of code should be a Python expression that can be called with the `eval()` function.\n"
"3. The code should represent a solution to the query.\n"
"4. PRINT ONLY THE EXPRESSION.\n"
"5. Do not quote the expression.\n"
)
pandas_prompt_str = (
"You are working with 3 pandas dataframes in Python.\n"
"The name of the dataframes is `df1`, 'df2' and 'df3'.\n"
"This is the result of `print(df1.head())`:\n"
"{df1_str}\n\n"
"This is the result of `print(df2.head())`:\n"
"{df2_str}\n\n"
"This is the result of `print(df3.head())`:\n"
"{df3_str}\n\n"
"Follow these instructions:\n"
"{instruction_str}\n"
"Query: {query_str}\n\n"
"Expression:"
)
response_synthesis_prompt_str = (
"Given an input question, synthesize a response from the query results.\n"
"Query: {query_str}\n\n"
"Pandas Instructions (optional):\n{pandas_instructions}\n\n"
"Pandas Output: {pandas_output}\n\n"
"Response: "
)
pandas_prompt1 = PromptTemplate(pandas_prompt_str).partial_format(
instruction_str=instruction_str, df1_str=df1.head(1)
)
pandas_output_parser1 = PandasInstructionParser(df1)
pandas_prompt2 = PromptTemplate(pandas_prompt_str).partial_format(
instruction_str=instruction_str, df2_str=df2.head(1)
)
pandas_output_parser2 = PandasInstructionParser(df2)
pandas_prompt3 = PromptTemplate(pandas_prompt_str).partial_format(
instruction_str=instruction_str, df3_str=df3.head(1)
)
pandas_output_parser3 = PandasInstructionParser(df3)
response_synthesis_prompt = PromptTemplate(response_synthesis_prompt_str)
We get the following error
ValueError: Module input keys must have exactly one key if dest_key is not specified. Remaining keys: in module: {'df2_str', 'query_str', 'df1_str'}
Upvotes: 0
Views: 1095
Reputation: 1
I was facing the same challenge while working on a similar type of project previously I started with concate the dataframes and in the instruction prompt I properly define the metadata of every dataframe and then also provide the few shot examples of query with concate dataframe.
Here is the resource for fewshot prompting:Fewshot prompting
instruction_str = (
"1. Convert the query to executable Python code using Pandas.\n"
"2. The final line of code should be a Python expression that can be called with the `eval()` function.\n"
"3. The code should represent a solution to the query.\n"
"4. PRINT ONLY THE EXPRESSION.\n"
//After this line define metadata of concate df and after this provide fewshot prompting if you need more accuracy in responses add chain of thought (COT) instructions aswell.
"5. Do not quote the expression.\n"
)
NOTE: For POC it was better but now for final product I'm working on custom solution and its in design phase. If you have something better idea plz do connect with me.
Upvotes: 0