Link Search Menu Expand Document

Text Generation Settings

By default a text generation algorithm called “greedy” is used. This algorithm simply picks the most likely next word. However, there are more sophisticated ways to perform next generation as described in this article by Hugging Face.

A class called GENSettings() is used to control which algorithm is used and its settings. It is passed to the “args” parameter for HappyGeneration.generate_text().

from happytransformer import GENSettings

GENSettings() contains the fields shown in Table 1.0

Table 1.0:

Parameter Default Definition
min_length 10 Minimum number of generated tokens
max_length 50 Maximum number of generated tokens
do_sample False When True, picks words based on their conditional probability
early_stopping False When True, generation finishes if the EOS token is reached
num_beams 1 Number of steps for each search path
temperature 1.0 How sensitive the algorithm is to selecting low probability options
top_k 50 How many potential answers are considered when performing sampling
top_p 1.0 Min number of tokens are selected where their probabilities add up to top_p
no_repeat_ngram_size 0 The size of an n-gram that cannot occur more than once. (0=infinity)

Examples 1.2:

from happytransformer import HappyGeneration, GENSettings

#---------------------------------------------------
    happy_gen = HappyGeneration()

    greedy_settings = GENSettings(no_repeat_ngram_size=2,  max_length=10)
    output_greedy = happy_gen.generate_text(
        "Artificial intelligence is ",
        args=greedy_settings)

    beam_settings = GENSettings(num_beams=5,  max_length=10)
    output_beam_search = happy_gen.generate_text(
        "Artificial intelligence is ",
        args=beam_settings)

    generic_sampling_settings = GENSettings(do_sample=True, top_k=0, temperature=0.7,  max_length=10)
    output_generic_sampling = happy_gen.generate_text(
        "Artificial intelligence is ",
        args=generic_sampling_settings)

    top_k_sampling_settings = GENSettings(do_sample=True, top_k=50, temperature=0.7,  max_length=10)
    output_top_k_sampling = happy_gen.generate_text(
        "Artificial intelligence is ",
        args=top_k_sampling_settings)
    
    top_p_sampling_settings = GENSettings(do_sample=True, top_k=0, top_p=0.8, temperature=0.7,  max_length=10)
    output_top_p_sampling = happy_gen.generate_text(
        "Artificial intelligence is ",
        args=top_p_sampling_settings)
        
    print("Greedy:", output_greedy.text)  # a new field of research that has been gaining
    print("Beam:", output_beam_search.text) # one of the most promising areas of research in
    print("Generic Sampling:", output_generic_sampling.text)  #  an area of highly promising research, and a
    print("Top-k Sampling:", output_top_k_sampling.text)  # a new form of social engineering. In this
    print("Top-p Sampling:", output_top_p_sampling.text)  # a new form of social engineering. In this