Leo
Leo

Reputation: 1168

Pandas: use apply to create 2 new columns

I have a dataset where col a represent the number of total values in values e,i,d,t which are in string format separated by a "-"

   a              e                i            d        t
0  4  40-80-120-150  0.5-0.3-0.2-0.2  30-32-30-32  1-1-1-1
1  4    40-40-40-40  0.1-0.1-0.1-0.1  18-18-18-18  1-2-3-4
3  4  40-80-120-150  0.5-0.3-0.2-0.2  30-32-30-32  1-1-1-1
5  4    40-40-40-40  0.1-0.1-0.1-0.1  18-18-18-18  1-2-3-4

I want to create 8 new columns, 4 representing the SUM of (e-i-d-t), 4 the product.

For example:

def funct_two_outputs(E, I, d, t, d_calib = 50):
  return E+i+d+t, E*i*d*t

OUT first 2 values: SUM_0, row0 = 40+0.5+30+1 SUM_1 = 80+0.3+32+1

The sum and product are example functions substituting my functions which are a bit more complicated.

I have written out a function **expand_on_col ** that creates separates all the e,i,d,t values into new columns:

def expand_on_col (df_, col_to_split = "namecol", sep='-', prefix="this"): 
  '''
  Pass a df indicating on which col you want to split, 
  return a df with the col split with a prefix.  
  
  '''
  df1 = df_[col_to_split].str.split(sep,expand=True).add_prefix(prefix)
  df1 = pd.concat([df_,df1], axis=1).replace(np.nan, '-')
  return df1 

Now i need to create 4 new columsn that are the sum of eidt, and 4 that are the prodct.

Example output for SUM:

   index  a              e                i            d        t  a-0  e-0  e-1  e-2  e-3  i-0  i-1  i-2  i-3  d-0  d-1  d-2  d-3  t-0  t-1  t-2  t-3  sum-0  sum-1  sum-2  sum-3
0      0  4  40-80-120-150  0.5-0.3-0.2-0.2  30-32-30-32  1-1-1-1    4   40   80  120  150  0.5  0.3  0.2  0.2   30   32   30   32    1    1    1    1     71    114    153    186
1      1  4    40-40-40-40  0.1-0.1-0.1-0.1  18-18-18-18  1-2-3-4    4   40   40   40   40  0.1  0.1  0.1  0.1   18   18   18   18    1    2    3    4     59     61     63     65
2      3  4  40-80-120-150  0.5-0.3-0.2-0.2  30-32-30-32  1-1-1-1    4   40   80  120  150  0.5  0.3  0.2  0.2   30   32   30   32    1    1    1    1     71    114    153    186
3      5  4    40-40-40-40  0.1-0.1-0.1-0.1  18-18-18-18  1-2-3-4    4   40   40   40   40  0.1  0.1  0.1  0.1   18   18   18   18    1    2    3    4     59     61     63     65

If i run the code with funct_one_output(only returns sum) it works, but wit the funct_two_outputs(suma and product) I get an error.

Here is the code:

import pandas as pd 
def expand_on_col (df_, col_to_split = "namecol", sep='-', prefix="this"): 
  '''
  Pass a df indicating on which col you want to split, 
  return a df with the col split with a prefix.  
  
  '''
  df1 = df_[col_to_split].str.split(sep,expand=True).add_prefix(prefix)
  df1 = pd.concat([df_,df1], axis=1).replace(np.nan, '-')
  return df1 


def funct_two_outputs(E, I, d, t, d_calib = 50): #the function i want to pass
  return E+i+d+t,  E*i*d*t

def funct_one_outputs(E, I, d, t, d_calib = 50): #for now i can olny use this one, cant use 2 return values.
  return E+i+d+t

for col in columns: 
  df = expand_on_col (df_=df, col_to_split = col,  sep='-', prefix=f"{col}-")
  cols_ = df.columns.drop(columns)

  df[cols_]= df[cols_].apply(pd.to_numeric, errors="coerce")
df["a"] = df["a"].apply(pd.to_numeric, errors="coerce")
df.reset_index(inplace=True)
for i in range (max(df["a"])):  
  name_1, name_2 = f"sum-{i}", f"mult-{i}" 
  df[name_1] = df.apply(lambda row: funct_one_outputs(E= row[f'e-{i}'], I=row[f'i-{i}'], d=row[f'd-{i}'], t=row[f"t-{i}"]), axis=1)
   #if i try and fill 2 outputs it wont work 
  df[[name_1, name_2]] = df.apply(lambda row: funct_two_outputs(E= row[f'e-{i}'], I=row[f'i-{i}'], d=row[f'd-{i}'], t=row[f"t-{i}"]), axis=1)

OUT:

ValueError                                Traceback (most recent call last)
<ipython-input-306-85157b89d696> in <module>()
     68   df[name_1] = df.apply(lambda row: funct_one_outputs(E= row[f'e-{i}'], I=row[f'i-{i}'], d=row[f'd-{i}'], t=row[f"t-{i}"]), axis=1)
     69    #if i try and fill 2 outputs it wont work
---> 70   df[[name_1, name_2]] = df.apply(lambda row: funct_two_outputs(E= row[f'e-{i}'], I=row[f'i-{i}'], d=row[f'd-{i}'], t=row[f"t-{i}"]), axis=1)
     71 
     72 

2 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/frame.py in __setitem__(self, key, value)
   3039             self._setitem_frame(key, value)
   3040         elif isinstance(key, (Series, np.ndarray, list, Index)):
-> 3041             self._setitem_array(key, value)
   3042         else:
   3043             # set column

/usr/local/lib/python3.7/dist-packages/pandas/core/frame.py in _setitem_array(self, key, value)
   3074                 )[1]
   3075                 self._check_setitem_copy()
-> 3076                 self.iloc._setitem_with_indexer((slice(None), indexer), value)
   3077 
   3078     def _setitem_frame(self, key, value):

/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in _setitem_with_indexer(self, indexer, value)
   1751                     if len(ilocs) != len(value):
   1752                         raise ValueError(
-> 1753                             "Must have equal len keys and value "
   1754                             "when setting with an iterable"
   1755                         )

ValueError: Must have equal len keys and value when setting with an iterable

Upvotes: 2

Views: 63

Answers (1)

piRSquared
piRSquared

Reputation: 294258

Don't Use apply

If you can help it

s = pd.to_numeric(
    df[['e', 'i', 'd', 't']]
      .stack()
      .str.split('-', expand=True)
      .stack()
)

sums = s.sum(level=[0, 2]).rename('Sum')
prods = s.prod(level=[0, 2]).rename('Prod')

sums_prods = pd.concat([sums, prods], axis=1).unstack()

sums_prods.columns = [f'{o}-{i}' for o, i in sums_prods.columns]

df.join(sums_prods)

   a              e                i            d        t  Sum-0  Sum-1  Sum-2  Sum-3  Prod-0  Prod-1  Prod-2  Prod-3
0  4  40-80-120-150  0.5-0.3-0.2-0.2  30-32-30-32  1-1-1-1   71.5  113.3  151.2  183.2   600.0   768.0   720.0   960.0
1  4    40-40-40-40  0.1-0.1-0.1-0.1  18-18-18-18  1-2-3-4   59.1   60.1   61.1   62.1    72.0   144.0   216.0   288.0
3  4  40-80-120-150  0.5-0.3-0.2-0.2  30-32-30-32  1-1-1-1   71.5  113.3  151.2  183.2   600.0   768.0   720.0   960.0
5  4    40-40-40-40  0.1-0.1-0.1-0.1  18-18-18-18  1-2-3-4   59.1   60.1   61.1   62.1    72.0   144.0   216.0   288.0

Upvotes: 2

Related Questions