Reputation: 2370
finetuned a model (https://huggingface.co/decapoda-research/llama-7b-hf) using peft and lora and saved as https://huggingface.co/lucas0/empath-llama-7b. Now im getting Pipeline cannot infer suitable model classes from
when trying to use it along with with langchain and chroma vectordb:
from langchain.embeddings import HuggingFaceHubEmbeddings
from langchain import PromptTemplate, HuggingFaceHub, LLMChain
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
from langchain.vectorstores import Chroma
repo_id = "sentence-transformers/all-mpnet-base-v2"
embedder = HuggingFaceHubEmbeddings(
repo_id=repo_id,
task="feature-extraction",
huggingfacehub_api_token="XXXXX",
)
comments = ["foo", "bar"]
embeddings = embedder.embed_documents(texts=comments)
docsearch = Chroma.from_texts(comments, embedder).as_retriever()
#docsearch = Chroma.from_documents(texts, embeddings)
llm = HuggingFaceHub(repo_id='lucas0/empath-llama-7b', huggingfacehub_api_token='XXXXX')
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=docsearch, return_source_documents=False)
q = input("input your query:")
result = qa.run(query=q)
print(result["result"])
is anyone able to tell me how to fix this? Is it an issue with the model card? I was facing issues with the lack of the config.json file and ended up just placing the same config.json as the model I used as base for the lora fine-tuning. Could that be the origin of the issue? If so, how to generate the correct config.json without having to get the original llama weights?
Also, is there a way of loading several sentences into a custom HF model (not only OpenAi, as the tutorial show) without using vector dbs?
Thanks!
The same issue happens when trying to run the API on the model's HF page:
Upvotes: 2
Views: 12883
Reputation: 122218
Before using the langchain
API to the huggingface model, you should try to load the model in Huggingface:
from transformers import AutoModel
model = AutoModel.from_pretrained('lucas0/empath-llama-7b')
And that'll throw some errors:
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-2-1b9ce76f5421> in <cell line: 3>()
1 from transformers import AutoModel
2
----> 3 model = AutoModel.from_pretrained('lucas0/empath-llama-7b')
1 frames
/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
2553 )
2554 else:
-> 2555 raise EnvironmentError(
2556 f"{pretrained_model_name_or_path} does not appear to have a file named"
2557 f" {_add_variant(WEIGHTS_NAME, variant)}, {TF2_WEIGHTS_NAME}, {TF_WEIGHTS_NAME} or"
OSError: lucas0/empath-llama-7b does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.
Then looking into the model files, it looks like only the adapter model is saved and not the model, https://huggingface.co/lucas0/empath-llama-7b/tree/main, so the Automodel is throwing tantrums.
To load an adapted model, you have to the base model and the peft (adapter model separated, first the installs (restart after installs, if needed):
! pip install -U peft accelerate
! pip install -U sentencepiece
! pip install -U transformers
Then to load the model, take a look at the guanaco
example, Trying to install guanaco (pip install guanaco) for a text classification model but getting error (You will need a GPU runtime)
import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, LlamaTokenizer, StoppingCriteria, StoppingCriteriaList, TextIteratorStreamer
model_name = "decapoda-research/llama-7b-hf"
adapters_name = 'lucas0/empath-llama-7b'
print(f"Starting to load the model {model_name} into memory")
m = AutoModelForCausalLM.from_pretrained(
model_name,
#load_in_4bit=True,
torch_dtype=torch.bfloat16,
device_map={"": 0}
)
m = PeftModel.from_pretrained(m, adapters_name)
m = m.merge_and_unload()
tok = LlamaTokenizer.from_pretrained(model_name)
tok.bos_token_id = 1
stop_token_ids = [0]
print(f"Successfully loaded the model {model_name} into memory")
Now you can load the model that you've adapted/fine-tuned in Huggingface transformers
, you can try it with langchain
, before that we have to dig the langchain code, to use a prompt with HF model, users are told to do this:
from langchain import PromptTemplate, LLMChain, HuggingFaceHub
template = """ Hey llama, you like to eat quinoa. Whatever question I ask you, you reply with "Waffles, waffles, waffles!".
Question: {input} Answer: """
prompt = PromptTemplate(template=template, input_variables=["input"])
model = HuggingFaceHub(repo_id="facebook/mbart-large-50",
model_kwargs={"temperature": 0, "max_length":200},
chain = LLMChain(prompt=prompt, llm=model)
But when we look at the HuggingFaceHub
object it isn't just a vanilla AutoModel
from transformers huggingface.
When we look at https://github.com/hwchase17/langchain/blob/master/langchain/chains/llm.py, we see that it's trying to load the llm=...
argument with some wrapper class, so we dig deeper into langchain's HuggingFaceHub
object at https://github.com/hwchase17/langchain/blob/master/langchain/llms/huggingface_hub.py
The HuggingFaceHub
object wraps over the huggingface_hub.inference_api.InferenceApi
for the text-generation
, text2text-generation
or summarization
tasks
And HuggingFaceHub
looks like some spaghetti like object that inherits from LLM
object https://github.com/hwchase17/langchain/blob/master/langchain/llms/base.py#L453
To summarize this a little, we want to:
HuggingFaceHub
with langchain
API,HuggingFaceHub
is actually a wrapper over the huggingface_hub.inference_api.InferenceApi
HuggingFaceHub
object is a subclass of llm.base.LLM
Given that knowledge on the HuggingFaceHub
object, now, we have several options:
Opinion: The easiest way around it is to totally avoid langchain
, since it's wrapper around things, you can write your customized wrapper that skip the levels of inheritance created in langchain to wrap around as many tools as it can/need
Ideally: Ask the langchain
developer/maintainer to load peft/adapter model and write another subclass for them
Practical:* Lets hack the thing and write our own LLM
subclass.
Practical solution:
Lets try to hack up a new LLM
subclass
from typing import Any, Dict, List, Mapping, Optional
from pydantic import Extra, root_validator
from langchain.callbacks.manager import CallbackManagerForLLMRun
from langchain.llms.base import LLM
from langchain.llms.utils import enforce_stop_tokens
from langchain import PromptTemplate, LLMChain
class HuggingFaceHugs(LLM):
pipeline: Any
class Config:
"""Configuration for this pydantic object."""
extra = Extra.forbid
def __init__(self, model, tokenizer, task="text-generation"):
super().__init__()
self.pipeline = pipeline(task, model=model, tokenizer=tokenizer)
@property
def _llm_type(self) -> str:
"""Return type of llm."""
return "huggingface_hub"
def _call(self, prompt, stop: Optional[List[str]] = None, run_manager: Optional[CallbackManagerForLLMRun] = None,):
# Runt the inference.
text = self.pipeline(prompt, max_length=100)[0]['generated_text']
# @alvas: I've totally no idea what this in langchain does, so I copied it verbatim.
if stop is not None:
# This is a bit hacky, but I can't figure out a better way to enforce
# stop tokens when making calls to huggingface_hub.
text = enforce_stop_tokens(text, stop)
print(text)
return text[len(prompt):]
template = """ Hey llama, you like to eat quinoa. Whatever question I ask you, you reply with "Waffles, waffles, waffles!".
Question: {input} Answer: """
prompt = PromptTemplate(template=template, input_variables=["input"])
hf_model = HuggingFaceHugs(model=m, tokenizer=tok)
chain = LLMChain(prompt=prompt, llm=hf_model)
chain("Who is Princess Momo?")
Phew, langchain
didn't complain... and here's the output:
{'input': 'Who is Princess Momo?',
'text': ' She is a princess. She is a princess. She is a princess. She is a princess. She is a princess. She is a princess. She is a princess. She is'}
Epilogue: Apparently this llama model doesn't understand that all it needs to do is to reply Waffles, waffles, waffles
See https://colab.research.google.com/drive/1l2GiSSPbajVyp2Nk3CFT4t3uH6-5TiBe?usp=sharing
Upvotes: 14