Reputation: 119
I have a data frame looking like this:
Date Product Quantity Price Buy/Sell
8/11 Apple 5 5 b
8/11 Apple 5 4 b
8/12 Pear 11 4 b
8/13 Pear 4 3 b
8/13 Pear 5 6 s
I am trying to distribute them according to a split. In this case, say 60% and 40%.
The top 60% would go to one df, the lower 40% goes to another.
So the result would be
output df1
8/11 Apple 5 5 b
8/11 Apple 1 4 b
8/12 Pear 10 4 b
8/13 Pear 3 6 s
output df2
8/11 Apple 4 4 b
8/12 Pear 1 4 b
8/13 Pear 4 3 b
8/13 Pear 2 6 s
I have grouped them with df.groupby(["product", "Buy/Sell"]), but im not sure how to access it to obatin the individual groups.
Im thinking after I group them, I can have a counter moving to the 60% side until it can't move whole entries anymore, then split the next one. After that, the rest going to 40%.
How would I go about accessing the groupby elements?
Is this a good way to go about it?
Upvotes: 0
Views: 34
Reputation: 11893
I think you are OK going this route. You will have to devise some quick function to split up the ones that need to be divided, which will be a bit of work.
You can access the groups out of the grouped object as below. The "GroupBy" object is iterable, and when you iterate on it, you get back a tuple with the group name (key) and a dataframe of that group.
In [43]: df
Out[43]:
Date Product Quantity Price Buy/Sell
0 8/11 Apple 5 5 b
1 8/11 Apple 5 4 b
2 8/12 Pear 11 4 b
3 8/13 Pear 4 3 b
4 8/13 Pear 5 6 s
In [44]: grouped = df.groupby(['Product', 'Buy/Sell'])
In [45]: type(grouped)
Out[45]: pandas.core.groupby.generic.DataFrameGroupBy
In [46]: for group_name, group in grouped:
...: print(group_name)
...: print(type(group))
...: print(group)
...: grp_tot = group['Quantity'].sum()
...: print(f'Total quantity within this group is {grp_tot}')
...: print('\n')
...:
('Apple', 'b')
<class 'pandas.core.frame.DataFrame'>
Date Product Quantity Price Buy/Sell
0 8/11 Apple 5 5 b
1 8/11 Apple 5 4 b
Total quantity within this group is 10
('Pear', 'b')
<class 'pandas.core.frame.DataFrame'>
Date Product Quantity Price Buy/Sell
2 8/12 Pear 11 4 b
3 8/13 Pear 4 3 b
Total quantity within this group is 15
('Pear', 's')
<class 'pandas.core.frame.DataFrame'>
Date Product Quantity Price Buy/Sell
4 8/13 Pear 5 6 s
Total quantity within this group is 5
I would use something like this and make another "destination" column or such in the overall dataframe and mark it with "1" or "2" for which split-out you want it to go into, and then you can just use that as a selection, which will avoid row-by-row appends, which is slow. Then you will have to go back and find the ones that need to be split up and work them.
Upvotes: 1