StackOverflow Questions for Tag: tokenize

ERROR: Could not find a version that satisfies the requirement pyonmttok ERROR: No matching distribution found for pyonmttok

Score: -1

Views: 1069

Answers: 2

Read More
Ellster
Ellster

Reputation: 1

Fixing Missing NLTK Tokenizer Resources

Score: 0

Views: 22

Answers: 2

Read More
Bill the Lizard
Bill the Lizard

Reputation: 405995

How do I tokenize a string in C++?

Score: 481

Views: 671361

Answers: 38

Read More
Towsif Ahamed Labib
Towsif Ahamed Labib

Reputation: 716

How can I accurately count tokens for Llama3/DeepSeek r1 prompts when Groq API reports “Request too large”?

Score: 1

Views: 1452

Answers: 0

Read More
green_ruby
green_ruby

Reputation: 21

How do I remove escape characters from output of nltk.word_tokenize?

Score: 0

Views: 33

Answers: 0

Read More
Bram Vanroy
Bram Vanroy

Reputation: 28505

Streaming with Ollama + Langchain: incorrect spacing

Score: -1

Views: 58

Answers: 0

Read More
SiSi
SiSi

Reputation: 121

how to use tiktoken in offline mode computer

Score: 9

Views: 32181

Answers: 8

Read More
pepr
pepr

Reputation: 20794

PunktTokenizer does not work with Russian `я.`

Score: 0

Views: 32

Answers: 0

Read More
pepr
pepr

Reputation: 20794

nltk add or remove some abbreviations for the specific project not working

Score: 0

Views: 26

Answers: 0

Read More
aguadoe
aguadoe

Reputation: 168

How to detect out-of-vocabulary words in a prompt

Score: 0

Views: 24

Answers: 0

Read More
Mee
Mee

Reputation: 1651

The size of tensor a (707) must match the size of tensor b (512) at non-singleton dimension 1

Score: 20

Views: 45168

Answers: 3

Read More
genekogan
genekogan

Reputation: 721

get indices of original text from nltk word_tokenize

Score: 11

Views: 9771

Answers: 3

Read More
Ron Libman
Ron Libman

Reputation: 31

Cuda out of memory while training

Score: 0

Views: 34

Answers: 0

Read More
Abdelrahman Mattar
Abdelrahman Mattar

Reputation: 1

Incorrect input token count

Score: 0

Views: 17

Answers: 0

Read More
cd1
cd1

Reputation: 16534

How to split a string in shell and get the last field

Score: 412

Views: 543502

Answers: 17

Read More
hanugm
hanugm

Reputation: 1417

CLIP model from `open_clip` module returns single embedding for 77 tokens

Score: 0

Views: 1400

Answers: 2

Read More
andre
andre

Reputation: 7249

Boost::Split using whole string as delimiter

Score: 12

Views: 10269

Answers: 3

Read More
Swaraj Gaikwad
Swaraj Gaikwad

Reputation: 11

tokenizer.train_from_iterator throwing TypeError: expected string or buffer

Score: 1

Views: 21

Answers: 0

Read More
Viacheslav Ravdin
Viacheslav Ravdin

Reputation: 133

Parsing PHP file in order to get an array of parameters

Score: 0

Views: 108

Answers: 2

Read More
triandicAnt
triandicAnt

Reputation: 1378

Latent Dirichlet allocation(LDA) performance by limiting word size for Corpus Documents

Score: 0

Views: 895

Answers: 3

Read More
PreviousPage 1Next