StackOverflow Questions for Tag: vllm

K_Augus
K_Augus

Reputation: 412

How to increase the generation throughput of Qwen-0.5B of vllm

Score: 0

Views: 12

Answers: 0

Read More
Charlie Parker
Charlie Parker

Reputation: 5199

How can I ensure deterministic text generation with vLLM, and does it support a global set_seed?

Score: 0

Views: 56

Answers: 1

Read More
Carter Wang
Carter Wang

Reputation: 27

How to get TTFT and TPOT in VLLM?

Score: 0

Views: 28

Answers: 0

Read More
Charlie Parker
Charlie Parker

Reputation: 5199

How to install pip install torch==2.1.2+cu118 in linux?

Score: 2

Views: 12631

Answers: 1

Read More
cpchung
cpchung

Reputation: 844

how to display progress when building from source in `pip install -e .`

Score: 0

Views: 19

Answers: 0

Read More
Charlie Parker
Charlie Parker

Reputation: 5199

Why does moving ML model initialization into a function prevent GPU OOM errors when del, gc.collect(), and torch.cuda.empty_cache() fail?

Score: 0

Views: 179

Answers: 0

Read More
Eric
Eric

Reputation: 21

Max len error when using Huggingface model

Score: 1

Views: 826

Answers: 1

Read More
Javide
Javide

Reputation: 2657

Cannot submit chat request to VLLM Pixtral in Python using MistralAI

Score: 0

Views: 44

Answers: 0

Read More
Matthew Dickson
Matthew Dickson

Reputation: 21

Getting Serialisation Error on Initial Call to Class Function Decorated with Ray.remote

Score: 0

Views: 26

Answers: 0

Read More
Laz
Laz

Reputation: 1

Host a Model with vllm for RAG

Score: 0

Views: 629

Answers: 1

Read More
Dharsann
Dharsann

Reputation: 21

XFormersMetadata.__init__() got an unexpected keyword argument 'is_prompt'

Score: 2

Views: 341

Answers: 0

Read More
Charlie Parker
Charlie Parker

Reputation: 5199

VLLM Objects Cause Memory Errors When Created in a Function even when explicitly clearing GPU cache, only sharing ref makes code not crash

Score: 1

Views: 1757

Answers: 1

Read More
Jijo Jose Varghese
Jijo Jose Varghese

Reputation: 1

Failed to build installable wheels for some pyproject.toml based projects (vllm)

Score: 0

Views: 583

Answers: 0

Read More
Harsh Soni
Harsh Soni

Reputation: 1

How to Integrate Guardrails ai with vllm or hugging face models using llama index

Score: 0

Views: 78

Answers: 0

Read More
Ali Ait-Bachir
Ali Ait-Bachir

Reputation: 750

How can run vLLM model on a multi GPU server

Score: 1

Views: 2962

Answers: 1

Read More
Cihan Yalçın
Cihan Yalçın

Reputation: 53

Stream output using VLLM

Score: 0

Views: 2272

Answers: 1

Read More
Yezun Chung
Yezun Chung

Reputation: 11

vllm infinite wating error in for loop (multi-gpu)

Score: 1

Views: 758

Answers: 0

Read More
ganto
ganto

Reputation: 222

How to do distributed batch inference using tensor parallelism with Ray?

Score: 0

Views: 162

Answers: 0

Read More
anuj0456
anuj0456

Reputation: 27

how can i pass a 4bit quantized model, quantized using bitsandbytes, to vllm?

Score: 0

Views: 265

Answers: 0

Read More
fmartinac
fmartinac

Reputation: 1

Mixtral 8x7b, am I running it wrong?

Score: 0

Views: 1011

Answers: 1

Read More
PreviousPage 1Next