create new columns or dataframe with a function?

Question

My question is how to simplify the following code. In my real data I would have to add 540 columns, but I guess there is a better way, especially for generating the columns. Maybe even separate dataframes?

Below you'll see the test-df I have with one column needed. 'bin_X_0' through 'bin_X_9' need to be generated, but then several, i.e. bin_Y_0, bin_Z_0 etc. through 9.

N = 10000
J = [2012,2013,2014]
K = ['A','B','C','D','E','F','G','H']
L = ['h', 'd', 'a'] 
S = ['AR1','PO1','RU1']

np.random.seed(0)

df = pd.DataFrame(
    {'Y':np.random.uniform(1,10,N),
     'X':np.random.uniform(1,10,N),
     'Z':np.random.uniform(1,10,N),
     'J':np.random.choice(J,N),
     'S':np.random.choice(S,N),
     'R':np.random.choice(L,N)
    })

df['bins_X'] = df.groupby('S').X.apply(pd.qcut, q=10, labels=np.arange(10))

df['bin_X_0'] = np.where((df['bins_X'] ==0) & (df['R'] =='a'), (df['X']*2)-2, 
                        np.where((df['bins_X'] ==0) & (df['R'] !='a'), -2, 0))

df.head()

Marjan Moderc · Accepted Answer

You can achieve this with a double for loop, by using current iterables and format function for the new column generation. The following code should do it:

for a in ["X", "Y", "Z"]:
    for num in list(range(0, 10)):
        df['bins_{}'.format(a)] = df.groupby('S')["{}".format(a)].apply(pd.qcut, q=10, labels=np.arange(10))

        df['bin_{}_{}'.format(a, num)] = np.where((df['bins_{}'.format(a)] == num) & (df['R'] == 'a'),
                                                  (df['{}'.format(a)] * 2) - 2,
                                                  np.where((df['bins_{}'.format(a)] == num) & (df['R'] != 'a'), -2, 0))

I hope it is clear, how current numbers and letters are dynamically picking up the appropriate data and generating two new columns inside every nested loop. The above code produces this dataframe, which I believe is what you are looking for:

 J  R    S         X         Y         Z bins_X   bin_X_0  bin_X_1  \
0  2014  d  PO1  7.734412  5.939322  4.529557      7  0.000000      0.0   
1  2014  h  AR1  2.621824  7.436704  1.370409      1  0.000000     -2.0   
2  2013  h  PO1  4.501208  6.424870  9.309705      4  0.000000      0.0   
3  2012  a  RU1  1.338402  5.903949  4.656115      0  0.676803      0.0   
4  2012  d  RU1  1.106090  4.812893  9.498540      0 -2.000000      0.0   

   bin_X_2   ...     bin_Z_0  bin_Z_1  bin_Z_2  bin_Z_3  bin_Z_4  bin_Z_5  \
0      0.0   ...         0.0      0.0      0.0     -2.0  0.00000      0.0   
1      0.0   ...        -2.0      0.0      0.0      0.0  0.00000      0.0   
2      0.0   ...         0.0      0.0      0.0      0.0  0.00000      0.0   
3      0.0   ...         0.0      0.0      0.0      0.0  7.31223      0.0   
4      0.0   ...         0.0      0.0      0.0      0.0  0.00000      0.0   

   bin_Z_6 bin_Z_7  bin_Z_8  bin_Z_9  
0      0.0     0.0      0.0      0.0  
1      0.0     0.0      0.0      0.0  
2      0.0     0.0      0.0     -2.0  
3      0.0     0.0      0.0      0.0  
4      0.0     0.0      0.0     -2.0

I hope you get the idea.

create new columns or dataframe with a function?

Answers (2)

Related Questions