split list of lists into 3 parts by percentage

Question

I have list of lists like:

list = [[[bad, good],"Antonyms"], [[good, nice],"Synonyms"]]

I need to split this data into train, development and test:60%, 20%, 20% And I have no idea how to do it. The similar questions doesnt give me an answer for my case. Maybe somboody have an idea?

Thank you

Venkatachalam · Accepted Answer

I am assuming that Antonyms, synonyms are some kind of categories for you. Using train_test_split from sklearn we can do the data splitting.

Note: I have changed the bad, good,etc into string. Hope that is the case with your dataset as well.

import numpy as np
from sklearn.model_selection import train_test_split

my_list = [[['bad', 'good'],"Antonyms"], [['good', 'nice'],"Synonyms"],
           [['good', 'nice'],"Synonyms"],[['good', 'nice'],"Synonyms"],
           [['good', 'nice'],"Synonyms"]]

data=np.array(my_list)

print(data.shape)
#(5, 2)

X,y=data[:,0],data[:,1]

#split the data to get 60% train and 40% test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=42)
#split the test again to get 20% dev and 20% test
X_dev, X_test, y_dev, y_test = train_test_split(X_test, y_test, test_size=0.5, random_state=42)

print(y_train.shape,y_dev.shape,y_test.shape)
#(3,) (1,) (1,)

split list of lists into 3 parts by percentage

Answers (2)

Related Questions