Reputation: 83367
I wonder how I can run some inference on the MPT-7B language model. The documentation page on MPT-7B language model on huggingface doesn't mention how to run the inference (i.e., given a few words, predict the next few words).
Upvotes: 1
Views: 516
Reputation: 83367
https://huggingface.co/mosaicml/mpt-30b gives an example code for inference:
import transformers
model = transformers.AutoModelForCausalLM.from_pretrained(
'mosaicml/mpt-30b',
trust_remote_code=True
)
from transformers import pipeline
with torch.autocast('cuda', dtype=torch.bfloat16):
inputs = tokenizer('Here is a recipe for vegan banana bread:\n', return_tensors="pt").to('cuda')
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
# or using the HF pipeline
pipe = pipeline('text-generation', model=model, tokenizer=tokenizer, device='cuda:0')
with torch.autocast('cuda', dtype=torch.bfloat16):
print(
pipe('Here is a recipe for vegan banana bread:\n',
max_new_tokens=100,
do_sample=True,
use_cache=True))
Just replace mpt-30b
with mpt-7b
if you wish to use MPT-7B.
Upvotes: 0