i.n.n.m
i.n.n.m

Reputation: 3046

Train Test Split for a list of dataframes - Pandas

I have a list of DataFrames that I want to split into train and test sets. For a single DataFrame, I could do the following,

Get the length of test split

split_point = len(df)- 125

and then,

train, test = df[0:split_point], df[split_point:]

This gives me the train and test split.

Now, for list of DataFrames I could get test set length for each DataFrame using,

split_point = [len(df)-125 for df in dfs]  ## THIS WORKS FINE

I want to get the train and test split for the whole list of dataframes as I have done for single dataframe. I tried the following,

train, test = [(df[0:split_point], df[split_point:]) for df in dfs]

## AND THE FOLLOWING

train, test = [(df[0:split_point] for df in dfs),(df[split_point:]) for df in dfs]

Both are not working. How can I do this?

(Some of the DataFrame's length might differ, but I am not worried about it as it will substract the 125 from the length, which I am considering for test set)

Upvotes: 0

Views: 631

Answers (1)

Miriam Farber
Miriam Farber

Reputation: 19634

You need to do

train, test = zip(*[(dfs[i][0:split_point[i]], dfs[i][split_point[i]:]) for i in range(len(dfs))])

Then each one of them would be a tuple with the corresponding parts of the data frames.

In the above code I am using

split_point = [len(df)-125 for df in dfs]

Just to make it more clear, consider the following more simple example:

r = [(i,i**2) for i in range(5)]
a,b=zip(*r)

Then a is (0, 1, 2, 3, 4) and b is (0, 1, 4, 9, 16).

Upvotes: 1

Related Questions