Learning Parameters
learning_rate: How much the model’s weights are adjusted per step. Too low and the model will take a long time to learn or get stuck in a suboptimal solution. Too high can cause can divergent behaviors.
num_train_epochs: The number of times the training data is iterated over.
weight_decay: A type of regularization. It prevents weights from getting too large. Thus, preventing overfitting.
batch_size: Number of training examples used per iteration
fp16: If true, enables half precision training which saves space by using 16 bits instead of 32 to store the model’s weights. Only available when CUDA/a a GPU is being used.
eval_ratio: The ratio of data supplied to input_filepath that will be used for evaluating. If eval_filepath is supplied this argument is ignored and input_filepath is used only as train data.
save_steps: Ratio of total train step before saving occurs.
eval_steps: Ratio of total train step before evaluating occurs.
logging_steps: Ratio of total train step before logging occurs.
output_dir: An output directory where models will be saved to if save_steps is enabled. Other features that leverage this directory may be added in the future.