Reputation: 3046
I have a list of DataFrames that I want to split into train and test sets. For a single DataFrame, I could do the following,
Get the length of test split
split_point = len(df)- 125
and then,
train, test = df[0:split_point], df[split_point:]
This gives me the train and test split.
Now, for list of DataFrames I could get test set length for each DataFrame using,
split_point = [len(df)-125 for df in dfs] ## THIS WORKS FINE
I want to get the train
and test
split for the whole list of dataframes as I have done for single dataframe. I tried the following,
train, test = [(df[0:split_point], df[split_point:]) for df in dfs]
## AND THE FOLLOWING
train, test = [(df[0:split_point] for df in dfs),(df[split_point:]) for df in dfs]
Both are not working. How can I do this?
(Some of the DataFrame's length might differ, but I am not worried about it as it will substract the 125 from the length, which I am considering for test set)
Upvotes: 0
Views: 631
Reputation: 19634
You need to do
train, test = zip(*[(dfs[i][0:split_point[i]], dfs[i][split_point[i]:]) for i in range(len(dfs))])
Then each one of them would be a tuple with the corresponding parts of the data frames.
In the above code I am using
split_point = [len(df)-125 for df in dfs]
Just to make it more clear, consider the following more simple example:
r = [(i,i**2) for i in range(5)]
a,b=zip(*r)
Then a
is (0, 1, 2, 3, 4)
and b
is (0, 1, 4, 9, 16)
.
Upvotes: 1