i1100362
i1100362

Reputation: 259

How to split the values inside the Dataframe using python

My dataframe :

df
Object      quantity
A             3
B             4
C             10
D             11

My desired result:

df
Object      quantity
A             3
B             4
C             4
C             4
C             2
D             4
D             4
D             3

My Goal here is to split the value stored in the column2 "quantity" so that it should be 4 or less than 4.

Which method I can use to solve this problem? any suggestion would be appreciated.

Upvotes: 2

Views: 131

Answers (3)

Simon
Simon

Reputation: 10150

Something like this could work. For each group where quantity is greater than 4, apply a function that does the row splitting and stores into a temp dataframe, then combine everything together to get your desired output:

df = pd.DataFrame({'idx': ['A', 'B', 'C', 'D'],
                   'quantity': [3, 4, 10, 11]})

def split_quant(df):
    quantities = ([4]*(df['quantity'].iat[0] // 4)) + [df['quantity'].iat[0] % 4]

    temp = pd.DataFrame({'idx': df['idx'].iat[0],
                         'quantity': quantities
                         }, index=range(len(quantities)))
    temp = temp[temp['quantity']!=0]

    return temp

df_split = df[df['quantity'] > 4].groupby('idx').apply(split_quant)

output = df[df['quantity'] <= 4].append(df_split).reset_index(drop=True)

writer = pd.ExcelWriter('output.xlsx')
output.to_excel(writer, 'Sheet1', index=False)
writer.save()

The above will give you the following output dataframe:

  idx  quantity
0   A         3
1   B         4
2   C         4
3   C         4
4   C         2
5   D         4
6   D         4
7   D         3

EDIT:

I took the liberty of running some timing tests of the various methods. Using Pandas' groupby and apply saves a lot of time and avoids the nested loops over the input data (although I'm sure theres an even faster way that can avoid the apply as well...)

Mine:

5.49 ms ± 240 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

@Iqbal Basyar:

22.8 ms ± 1.47 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

@sobek

17.7 ms ± 922 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Upvotes: 1

sobek
sobek

Reputation: 1426

This works but as far as pandas is concerned, it's not pretty nor is it fast:

df = pd.DataFrame({'idx': ['A', 'B', 'C', 'D', 'E', 'F', 'G'],
                   'quantity': [1., 2., 3., 4., 5., 6., 7.]})

df['factor'] = df.quantity // 4.
df['modulo'] = df.quantity % 4.

res = pd.DataFrame({'idx': [], 'quantity': []})

for idx, row in df.iterrows():
    for idxx in range(int(row.factor)):
        res = res.append({'idx': row.idx, 'quantity': 4.},
                         ignore_index=True)
    if row.modulo > 0:
        res = res.append({'idx': row.idx, 'quantity': row.modulo},
                         ignore_index=True)

In [24]: df
Out[24]: 
  idx  quantity
0   A       1.0
1   B       2.0
2   C       3.0
3   D       4.0
4   E       5.0
5   F       6.0
6   G       7.0

In [22]: res
Out[22]: 
  idx  quantity
0   A       1.0
1   B       2.0
2   C       3.0
3   D       4.0
4   E       4.0
5   E       1.0
6   F       4.0
7   F       2.0
8   G       4.0
9   G       3.0

Upvotes: 1

Iqbal Basyar
Iqbal Basyar

Reputation: 167

Unfortunately, Pandas did not support this feature. So you have to create a new dataframe based on your old dataframe.

For each item in old dataframe, calculate

old_quantity = n * 4 + rest_quantitity

So in the new dataframe, you will add n item(s) with quantity of 4 , plus one with quantity of rest_quantity (if rest_quantity is not zero)

df = df = pd.DataFrame({'item': ["A","B","C"], 'qty': [3, 8,11]})
new_df = pd.DataFrame({'Item': [], 'qty': []})

for idx, item in df.iterrows():    
  if item['qty'] > 4 :
      n = item['qty'] // 4
      r = item['qty'] % 4 
      for _ in range(n):
          new_df.loc[len(new_df)] = [item['item'], 4]
      if r > 0 :
          new_df.loc[len(new_df)] = [item['item'], r]
  else :
      new_df.loc[len(new_df)] = [item['item'], item['qty']]

df

    item qty
0   A   3
1   B   8
2   C   11

new_df

   Item qty
0   A   3.0
1   B   4.0
2   B   4.0
3   C   4.0
4   C   4.0
5   C   3.0

Upvotes: 1

Related Questions