Creating new pandas columns based on conditions on existing columns

Question

I have a dataframe as shown below:

col1 = ['a','b','c','a','c','a','b','c','a']
col2 = [1,1,0,1,1,0,1,1,0]
df2 = pd.DataFrame(zip(col1,col2),columns=['name','count'])

    name    count
0   a       1
1   b       1
2   c       0
3   a       1
4   c       1
5   a       0
6   b       1
7   c       1
8   a       0

I am trying to find the ratio of the number of zeros to the sum of zeros+ones corresponding to each element in the "name" column. Firstly i aggreated the counts as follows:

for j in df2.name.unique():
    print(j)
    zero_ct = zero_one_frequencies[zero_one_frequencies['name'] == j][0]
    full_ct = zero_one_frequencies[zero_one_frequencies['name'] == j][0] + zero_one_frequencies[zero_one_frequencies['name'] == j][1]
    zero_pb = zero_ct / full_ct
    one_pb = 1 - zero_pb
    print(f"ZERO rations for {j} = {zero_pb}")
    print(f"One ratios for {j} = {one_pb}")
    print("="*30)

And the output looks like:

a
ZERO ratios for a = 0    0.5
dtype: float64
One ratios for a = 0    0.5
dtype: float64
==============================
b
ZERO ratios for b = 1    0.0
dtype: float64
One ratios for b = 1    1.0
dtype: float64
==============================
c
ZERO ratios for c = 2    0.333333
dtype: float64
One ratios for c = 2    0.666667
dtype: float64
==============================

My goal is to add 2 new columns to the dataframe: "name_0" and "name_1" with th ratio values for each element in the "name" column. I tried something but its not giving the expected results:

for j in df2.name.unique():
    print(j)
    zero_ct = zero_one_frequencies[zero_one_frequencies['name'] == j][0]
    full_ct = zero_one_frequencies[zero_one_frequencies['name'] == j][0] + zero_one_frequencies[zero_one_frequencies['name'] == j][1]
    zero_pb = zero_ct / full_ct
    one_pb = 1 - zero_pb
    print(f"ZERO Probablitliy for {j} = {zero_pb}")
    print(f"One Probablitliy for {j} = {one_pb}")
    print("="*30)
    
    condition1 = [ df2['name'].eq(j) & df2['count'].eq(0)]
    condition2 = [ df2['name'].eq(j) & df2['count'].eq(1)]
    choice1 = zero_pb.tolist()
    choice2 = one_pb.tolist()

    print(f'choice1 = {choice1}, choice2 = {choice2}')
    df2["name"+str("_0")] = np.select(condition1, choice1, default=0)
    df2["name"+str("_1")] = np.select(condition2, choice2, default=0)

The column is updated with the values of the name element 'c'. It's to be expected as the last computed values are being used to update all the values.

Is there another way to use the np.select effectively?

Expected output:

    name    count   name_0      name_1
0   a       1       0.000000    0.500000
1   b       1       0.000000    1.000000
2   c       0       0.333333    0.000000
3   a       1       0.000000    0.500000
4   c       1       0.000000    0.666667
5   a       0       0.500000    0.000000
6   b       1       0.000000    1.000000
7   c       1       0.000000    0.666667
8   a       0       0.500000    0.000000

Oddaspa · Accepted Answer

I did not have access to zero_one_frequencies df. So I took the liberty of trying to solve the problem my way.

import pandas as pd
import numpy as np
col1 = ['a','b','c','a','c','a','b','c','a']
col2 = [1,1,0,1,1,0,1,1,0]
df2 = pd.DataFrame(zip(col1,col2),columns=['name','count'])

df2["name_0"] = 0
df2["name_1"] = 0

for name in df2['name'].unique():
  df_name = df2[df2['name'] == name]
  prob_1 = sum(df_name['count']/df_name.shape[0])
  for count in df2['count'].unique():
    indx = np.where((df2['name'] == name) & (df2['count'] == count))
    df2["name_" + str(count)].loc[indx] = np.abs(((count +1) % 2) - prob_1)

Output:

name    count   name_0  name_1
0   a   1   0.000000    0.500000
1   b   1   0.000000    1.000000
2   c   0   0.333333    0.000000
3   a   1   0.000000    0.500000
4   c   1   0.000000    0.666667
5   a   0   0.500000    0.000000
6   b   1   0.000000    1.000000
7   c   1   0.000000    0.666667
8   a   0   0.500000    0.000000

For understanding np.select I recommend seeing this post.

Creating new pandas columns based on conditions on existing columns

Answers (2)

Related Questions