SRajput
SRajput

Reputation: 45

can not split large .txt file into train, test and validation parts for deep text corrector

I have a single large .txt file and I want to split it into train, test and validation set. below are the lines of code where I want to use those flies. I am not getting any intuition about how to do it.

python correct_text.py --train_path 
                        /movie_dialog_train.txt \
                       --val_path /movie_dialog_val.txt \
                       --config DefaultMovieDialogConfig \
                       --data_reader_type MovieDialogReader \
                       --model_path /movie_dialog_model

Upvotes: 3

Views: 1168

Answers (1)

Aditya Deshpande
Aditya Deshpande

Reputation: 2014

You can load the large file into a Pandas DataFrame(say, df) using pd.from_csv() method After this, you can split the dataframe into Train(df_train) and Test(df_val)

Now, you can use the pd.to_csv() two times and pass the filenames as a function parameter to generate text files of movie_dialog_train.txt and movie_dialog_val.txt

You can create a small Python script just for this and run it, so that your train and validation files are present before you actually run the code.

Upvotes: 1

Related Questions