Questions
Questions

Reputation: 75

Types of Train test split

Can anyone explain me these 2 different types of train test split. I know the first one. The second one I saw it on someones code.

train_text, temp_text, train_labels, temp_labels = train_test_split(df['text'], df['spam'], 
                                                                random_state=42, 
                                                                test_size=0.3, 
                                                                stratify=df['spam'])
df_train, df_valid = model_selection.train_test_split(
                                                   text, test_size=0.1, 
                                                    random_state=42, stratify=data.spam.values)

In the second example why did the person take only 2 variables instead of 4

Upvotes: 1

Views: 360

Answers (2)

stan0
stan0

Reputation: 11807

why did the person take only 2 variables instead of 4

The reason is that the train_test_split takes two types of parameters - *arrays and **options - and:

  • In the first example the *arrays are df['text'], df['spam']. The rest of the arguments (the **options) are not relevant to the question. So, the function receives two arrays - df['text'] and df['spam'] and it produces train&test splits for each of the two arrays, therefore it has 4 results.

  • In the second example, only one array is provided - text so the function returns only two results - the train and the test splits of the text.

The documentation of the function states that it returns:

splitting: list, length=2 * len(arrays) List containing train-test split of inputs.

Upvotes: 1

Pike Msonda
Pike Msonda

Reputation: 1

According to this. If you use stratify the data will be split using the value of stratify as class labels in a stratified fashion. Which helps in class distribution.

If so since in both the first and second example stratify is not None, the data will be stratified.

Upvotes: 0

Related Questions