Saving and Loading Preprocessed Data

Table 7.0

The provided data is preprocessed before being given to the model for training or evaluating. This process may be computationally expensive depending on the amount of data you’re using. With Happy Transformer, it is possible to save the preprocessed data for training or evaluating, so that the next time you run the model, you can load the saved data to skip the preprocessing step.

All argument classes for training and evaluating (such as GENTrainArgs and GENEvalArgs) have the following parameters:

Table 7.1

Parameter Default Meaning
save_path ”” Path to a folder to save the data
load_path ”” Path to a folder to load the data

Example 7.0 shows the process of saving and loading a dataset for word prediction training

Example 7.0

from happytransformer import HappyWordPrediction, WPTrainArgs
# ---------------------------------------------------------
happy_wp = HappyWordPrediction()
train_args_1 = WPTrainArgs(save_path="data/")
happy_wp.train("data/wp/train-eval.txt", args=train_args_1)
    
    
train_args_2 = WPTrainArgs( load_path="data/")
# if you're loading data, then you can set input_filepath to anything 
happy_wp.train(input_filepath="", args=train_args_2)

The same pattern is used for all other training and evaluating methods. Example 7.1 shows the same process being applied to saving and loading data for text classification training.

Example 7.1

from happytransformer import HappyTextClassification, TCEvalArgs
# ---------------------------------------------------------
happy_tc = HappyTextClassification()
eval_args_1 = TCEvalArgs(save_path="data/")
result_1 = happy_tc.eval("data/tc/train-eval.txt", args=eval_args_1)

eval_args_2 = TCEvalArgs(load_path="data/")
# if you're loading data, then you can set input_filepath to anything 
result_2 = happy_tc.eval(input_filepath="", args=eval_args_2)