llssff
llssff

Reputation: 119

Access groupby Pandas taking first n quantity

I have a data frame looking like this:

Date Product Quantity Price Buy/Sell
8/11 Apple       5      5      b
8/11 Apple       5      4      b
8/12 Pear        11     4      b
8/13 Pear        4      3      b
8/13 Pear        5      6      s

I am trying to distribute them according to a split. In this case, say 60% and 40%.

The top 60% would go to one df, the lower 40% goes to another.

So the result would be

output df1
8/11 Apple  5 5 b
8/11 Apple  1 4 b
8/12 Pear  10 4 b
8/13 Pear   3 6 s

output df2
8/11 Apple  4 4 b
8/12 Pear   1 4 b
8/13 Pear   4 3 b
8/13 Pear   2 6 s

I have grouped them with df.groupby(["product", "Buy/Sell"]), but im not sure how to access it to obatin the individual groups.

Im thinking after I group them, I can have a counter moving to the 60% side until it can't move whole entries anymore, then split the next one. After that, the rest going to 40%.

How would I go about accessing the groupby elements?

Is this a good way to go about it?

Upvotes: 0

Views: 34

Answers (1)

AirSquid
AirSquid

Reputation: 11893

I think you are OK going this route. You will have to devise some quick function to split up the ones that need to be divided, which will be a bit of work.

You can access the groups out of the grouped object as below. The "GroupBy" object is iterable, and when you iterate on it, you get back a tuple with the group name (key) and a dataframe of that group.

In [43]: df                                                                       
Out[43]: 
   Date Product  Quantity  Price Buy/Sell
0  8/11   Apple         5      5        b
1  8/11   Apple         5      4        b
2  8/12    Pear        11      4        b
3  8/13    Pear         4      3        b
4  8/13    Pear         5      6        s

In [44]: grouped = df.groupby(['Product', 'Buy/Sell'])                            

In [45]: type(grouped)                                                            
Out[45]: pandas.core.groupby.generic.DataFrameGroupBy

In [46]: for group_name, group in grouped: 
    ...:     print(group_name) 
    ...:     print(type(group)) 
    ...:     print(group) 
    ...:     grp_tot = group['Quantity'].sum() 
    ...:     print(f'Total quantity within this group is {grp_tot}') 
    ...:     print('\n') 
    ...:                                                                          
('Apple', 'b')
<class 'pandas.core.frame.DataFrame'>
   Date Product  Quantity  Price Buy/Sell
0  8/11   Apple         5      5        b
1  8/11   Apple         5      4        b
Total quantity within this group is 10


('Pear', 'b')
<class 'pandas.core.frame.DataFrame'>
   Date Product  Quantity  Price Buy/Sell
2  8/12    Pear        11      4        b
3  8/13    Pear         4      3        b
Total quantity within this group is 15


('Pear', 's')
<class 'pandas.core.frame.DataFrame'>
   Date Product  Quantity  Price Buy/Sell
4  8/13    Pear         5      6        s
Total quantity within this group is 5

I would use something like this and make another "destination" column or such in the overall dataframe and mark it with "1" or "2" for which split-out you want it to go into, and then you can just use that as a selection, which will avoid row-by-row appends, which is slow. Then you will have to go back and find the ones that need to be split up and work them.

Upvotes: 1

Related Questions