Reputation: 45
I have a single large .txt file and I want to split it into train, test and validation set. below are the lines of code where I want to use those flies. I am not getting any intuition about how to do it.
python correct_text.py --train_path
/movie_dialog_train.txt \
--val_path /movie_dialog_val.txt \
--config DefaultMovieDialogConfig \
--data_reader_type MovieDialogReader \
--model_path /movie_dialog_model
Upvotes: 3
Views: 1168
Reputation: 2014
You can load the large file into a Pandas DataFrame(say, df) using pd.from_csv()
method
After this, you can split the dataframe into Train(df_train) and Test(df_val)
Now, you can use the pd.to_csv()
two times and pass the filenames as a function parameter to generate text files of movie_dialog_train.txt and movie_dialog_val.txt
You can create a small Python script just for this and run it, so that your train and validation files are present before you actually run the code.
Upvotes: 1