NiMa
NiMa

Reputation: 173

langchain with_structured_output Parsing a dictionary of lists of custom classes

I am trying to use Langchain to Extract Structured Output from Unstructured Texts with LLM Tool-Calling.

I have a code that works:

  import os
  from pydantic import BaseModel, Field
  from langchain_openai import ChatOpenAI

  model = ChatOpenAI(model="gpt-4o-mini-2024-07-18", temperature=0.0)

  class A(BaseModel):
    a_1: str
    a_2: str
    r: str

  class B(BaseModel):
    a: str
    b_1: str
    b_2: str
    r: str
  
  class C(BaseModel):
    ccc:List[A]
    ppp: List[B]

  structured_llm = model.with_structured_output(C)

  response = structured_llm.invoke(prompt)

I want to get "a" as a key in ppp , but code (using Dict) below fails:

  import os
  from pydantic import BaseModel, Field
  from langchain_openai import ChatOpenAI

  model = ChatOpenAI(model="gpt-4o-mini-2024-07-18", temperature=0.0)

  class A(BaseModel):
    a_1: str
    a_2: str
    r: str

  class B(BaseModel):
    b_1: str
    b_2: str
    r: str
  
  class C(BaseModel):
    ccc:List[A]
    ppp: Dict[str, List[B]]


  structured_llm = model.with_structured_output(C)

  response = structured_llm.invoke(prompt)

Error :

ValidationError: 1 validation error for C
ppp
  Field required [type=missing, input_value={'ccc': [{'a_1': 'Price',...tant to Battery Life'}]}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing

Any clue how to format it as a Dict?

Upvotes: 2

Views: 234

Answers (1)

Jakob Riedle
Jakob Riedle

Reputation: 2018

I had the exact same error message when trying to do the same. My first idea was to write the Dict[str, str] as List[Tuple[str, str]]. This yielded a similar issue though.

What ended up working for me, was to create another Model with two attributes acting as key and value pair and having a list of that:

from typing import Generic, TypeVar
from pydantic import BaseModel

# Parameterized Key-Value-Pair Model
TKey = TypeVar("TKey")
TValue = TypeVar("TValue")
class KeyValuePair(BaseModel, Generic[TKey, TValue]):
    key: TKey
    value: TValue

class C(BaseModel):
   ccc: List[A]
   ppp: List[KeyValuePair[str, List[B]]]

Let me know, if that works for you.

Upvotes: 1

Related Questions