StackOverflow Questions for Tag: vllm

K_Augus

Reputation: 412

How to increase the generation throughput of Qwen-0.5B of vllm

large-language-modelthroughputvllm

Score: 0

Answers: 0

Charlie Parker

Reputation: 5199

How can I ensure deterministic text generation with vLLM, and does it support a global set_seed?

vllm

Score: 0

Answers: 1

Carter Wang

Reputation: 27

How to get TTFT and TPOT in VLLM?

vllm

Score: 0

Answers: 0

Charlie Parker

Reputation: 5199

How to install pip install torch==2.1.2+cu118 in linux?

pytorchhuggingface-transformersdspyvllm

Score: 2

Answers: 1

cpchung

Reputation: 844

how to display progress when building from source in `pip install -e .`

setuptoolssetup.pyvllm

Score: 0

Answers: 0

Charlie Parker

Reputation: 5199

Why does moving ML model initialization into a function prevent GPU OOM errors when del, gc.collect(), and torch.cuda.empty_cache() fail?

python-3.xnlpgarbage-collectionhuggingface-transformersvllm

Score: 0

Answers: 0

Eric

Reputation: 21

Max len error when using Huggingface model

langchainlarge-language-modelvllm

Score: 1

Answers: 1

Javide

Reputation: 2657

Cannot submit chat request to VLLM Pixtral in Python using MistralAI

pythonlarge-language-modelvllmpixtral

Score: 0

Answers: 0

Matthew Dickson

Reputation: 21

Getting Serialisation Error on Initial Call to Class Function Decorated with Ray.remote

serializationrayvllm

Score: 0

Answers: 0

Laz

Reputation: 1

Host a Model with vllm for RAG

chatbotlangchainlarge-language-modelragvllm

Score: 0

Answers: 1

Dharsann

Reputation: 21

XFormersMetadata.init() got an unexpected keyword argument 'is_prompt'

pythonmachine-learningartificial-intelligencelarge-language-modelvllm

Score: 2

Answers: 0

Charlie Parker

Reputation: 5199

VLLM Objects Cause Memory Errors When Created in a Function even when explicitly clearing GPU cache, only sharing ref makes code not crash

pythonmachine-learningpytorchgpuvllm

Score: 1

Answers: 1

Jijo Jose Varghese

Reputation: 1

Failed to build installable wheels for some pyproject.toml based projects (vllm)

pyproject.tomlvllm

Score: 0

Answers: 0

Harsh Soni

Reputation: 1

How to Integrate Guardrails ai with vllm or hugging face models using llama index

open-sourcelarge-language-modelhuggingfaceguardvllm

Score: 0

Answers: 0

Ali Ait-Bachir

Reputation: 750

How can run vLLM model on a multi GPU server

pythongpularge-language-modelvllm

Score: 1

Answers: 1

Cihan Yalçın

Reputation: 53

Stream output using VLLM

pythondeploymentlarge-language-modelretrieval-augmented-generationvllm

Score: 0

Answers: 1

Yezun Chung

Reputation: 11

vllm infinite wating error in for loop (multi-gpu)

multi-gpuvllm

Score: 1

Answers: 0

ganto

Reputation: 222

How to do distributed batch inference using tensor parallelism with Ray?

large-language-modelrayvllm

Score: 0

Answers: 0

anuj0456

Reputation: 27

how can i pass a 4bit quantized model, quantized using bitsandbytes, to vllm?

langchainlarge-language-modelquantizationmistral-7bvllm

Score: 0

Answers: 0

fmartinac

Reputation: 1

Mixtral 8x7b, am I running it wrong?

gpuartificial-intelligencevllmmixtral-8x7b

Score: 0

Answers: 1

PreviousPage 1Next

StackOverflow Questions for Tag: vllm

How to increase the generation throughput of Qwen-0.5B of vllm

How can I ensure deterministic text generation with vLLM, and does it support a global set_seed?

How to get TTFT and TPOT in VLLM?

How to install pip install torch==2.1.2+cu118 in linux?

how to display progress when building from source in `pip install -e .`

Why does moving ML model initialization into a function prevent GPU OOM errors when del, gc.collect(), and torch.cuda.empty_cache() fail?

Max len error when using Huggingface model

Cannot submit chat request to VLLM Pixtral in Python using MistralAI

Getting Serialisation Error on Initial Call to Class Function Decorated with Ray.remote

Host a Model with vllm for RAG

XFormersMetadata.__init__() got an unexpected keyword argument &#39;is_prompt&#39;

VLLM Objects Cause Memory Errors When Created in a Function even when explicitly clearing GPU cache, only sharing ref makes code not crash

Failed to build installable wheels for some pyproject.toml based projects (vllm)

How to Integrate Guardrails ai with vllm or hugging face models using llama index

How can run vLLM model on a multi GPU server

Stream output using VLLM

vllm infinite wating error in for loop (multi-gpu)

How to do distributed batch inference using tensor parallelism with Ray?

how can i pass a 4bit quantized model, quantized using bitsandbytes, to vllm?

Mixtral 8x7b, am I running it wrong?

XFormersMetadata.init() got an unexpected keyword argument 'is_prompt'