can not split large .txt file into train, test and validation parts for deep text corrector

Question

I have a single large .txt file and I want to split it into train, test and validation set. below are the lines of code where I want to use those flies. I am not getting any intuition about how to do it.

python correct_text.py --train_path 
                        /movie_dialog_train.txt \
                       --val_path /movie_dialog_val.txt \
                       --config DefaultMovieDialogConfig \
                       --data_reader_type MovieDialogReader \
                       --model_path /movie_dialog_model

Aditya Deshpande · Accepted Answer

You can load the large file into a Pandas DataFrame(say, df) using pd.from_csv() method After this, you can split the dataframe into Train(df_train) and Test(df_val)

Now, you can use the pd.to_csv() two times and pass the filenames as a function parameter to generate text files of movie_dialog_train.txt and movie_dialog_val.txt

You can create a small Python script just for this and run it, so that your train and validation files are present before you actually run the code.

can not split large .txt file into train, test and validation parts for deep text corrector

Answers (1)

Related Questions