With the HuggingFace transformer, how can I return multiple samples when generating text?

Question

I'm going off of https://github.com/cortexlabs/cortex/blob/master/examples/pytorch/text-generator/predictor.py

But if I pass num_samples=5, I get:

    generated = torch.cat((generated, next_token.unsqueeze(0)), dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Got 5 and 1 in dimension 0

the code is:

def sample_sequence(
    model,
    length,
    context,
    num_samples=1,
    temperature=1,
    top_k=0,
    top_p=0.9,
    repetition_penalty=1.0,
    device="cpu",
):
    context = torch.tensor(context, dtype=torch.long, device=device)
    context = context.unsqueeze(0).repeat(num_samples, 1)
    print('context_size', context.shape)
    generated = context
    print('context', context)
    with torch.no_grad():
        for _ in trange(length):
            inputs = {"input_ids": generated}
            outputs = model(
                **inputs
            )  # Note: we could also use 'past' with GPT-2/Transfo-XL/XLNet/CTRL (cached hidden-states)
            next_token_logits = outputs[0][0, -1, :] / (temperature if temperature > 0 else 1.0)

            # reptition penalty from CTRL (https://arxiv.org/abs/1909.05858)
            for _ in set(generated.view(-1).tolist()):
                next_token_logits[_] /= repetition_penalty

            filtered_logits = top_k_top_p_filtering(next_token_logits, top_k=top_k, top_p=top_p)
            if temperature == 0:  # greedy sampling:
                next_token = torch.argmax(filtered_logits).unsqueeze(0)
            else:
                next_token = torch.multinomial(F.softmax(filtered_logits, dim=-1), num_samples=1)
            generated = torch.cat((generated, next_token.unsqueeze(0)), dim=1)
    return generated

cronoik · Accepted Answer

As far as I can see this code doesn't provide multiple samples, but you can adjust it with a some adjustments.

This line uses already multinomial but returns only 1:

next_token = torch.multinomial(F.softmax(filtered_logits, dim=-1), num_samples=1)

change it to:

next_token = torch.multinomial(F.softmax(filtered_logits, dim=-1), num_samples=num_samples)

Now you also need to change the result construction. This concatenates line the next_token with the sentence. You get now num_samples of next_tokens and you need unsqueeze all of them:

generated = torch.cat((generated, next_token.unsqueeze(0)), dim=1)

change it to:

generated = torch.cat((generated, next_token.unsqueeze(1)), dim=1)

The whole function should look like this now:

def sample_sequence(
    model,
    length,
    context,
    num_samples=1,
    temperature=1,
    top_k=0,
    top_p=0.9,
    repetition_penalty=1.0,
    device="cpu",
):
    context = torch.tensor(context, dtype=torch.long, device=device)
    context = context.unsqueeze(0).repeat(num_samples, 1)
    generated = context
    with torch.no_grad():
        for _ in trange(length):
            inputs = {"input_ids": generated}
            outputs = model(
                **inputs
            )  # Note: we could also use 'past' with GPT-2/Transfo-XL/XLNet/CTRL (cached hidden-states)
            next_token_logits = outputs[0][0, -1, :] / (temperature if temperature > 0 else 1.0)

            # reptition penalty from CTRL (https://arxiv.org/abs/1909.05858)
            for _ in set(generated.view(-1).tolist()):
                next_token_logits[_] /= repetition_penalty

            filtered_logits = top_k_top_p_filtering(next_token_logits, top_k=top_k, top_p=top_p)
            if temperature == 0:  # greedy sampling:
                next_token = torch.argmax(filtered_logits).unsqueeze(0)
            else:
                next_token = torch.multinomial(F.softmax(filtered_logits, dim=-1), num_samples=num_samples)
            generated = torch.cat((generated, next_token.unsqueeze(1)), dim=1)
    return generated

Last but not least you have to change your tokenizer.decode call to tokenizer.batch_decode as the return value contains now multiple samples:

tokenizer.batch_decode(output.tolist(), clean_up_tokenization_spaces=True, skip_special_tokens=True)

Something you have to think of byt yourself, is what you want to do when there is no valide next_token. Currently you will receive an error message like:

RuntimeError: invalid multinomial distribution (with replacement=False, not enough non-negative category to sample)

Another thing you have to think of, is if their code is even correct. During the few test I have conducted, it felt like that the quality of created sentences decreased with an increasing number of num_samples (i.e. Maybe the quality is better when you use a simple loop to call sample_sequence multiple times?). I haven't worked with GPT2 yet and can't help you here.

With the HuggingFace transformer, how can I return multiple samples when generating text?

Answers (1)

Related Questions