some_programmer
some_programmer

Reputation: 3558

How to create a column of zeros of a particular value based on a certain condition?

I have a df s19_df in a dictionary Bgf as follows:

BacksGas_Flow_sccm  ContextID   StepID  Time_Elapsed    iso_forest
61.81640625 7289972 19  40.503  -1
62.59765625 7289972 19  41.503  -1
63.671875   7289972 19  42.503  1
65.625  7289972 19  43.503  1
61.81640625 7289973 19  40.448  -1
62.59765625 7289973 19  41.448  -1
63.671875   7289973 19  42.448  1
65.625  7289973 19  43.448  1

I wrote a function to calculate the number of +1s and -1s in the iso_forest by doing a groupby on the ContextID column and then calculate the ratio of -1/1:

def minus1_plus1_ratio(dictionary, new_df, step_df):
    dictionary[new_df] = dictionary[step_df].groupby(['ContextID', 'iso_forest']).size().reset_index(name='count')
    dictionary[new_df] = pd.pivot_table(dictionary[new_df], values = 'count', columns = ['iso_forest'], 
                                          index = ['ContextID']).fillna(value = 0)
    dictionary[new_df]['-1/1'] =  (dictionary[new_df][-1])/(dictionary[new_df][1])
    dictionary[new_df] = dictionary[new_df].sort_values(by = '-1/1', ascending = False)
    return dictionary[new_df]

So, when I run the function on the above df

minus1_plus1_ratio(Bgf, 's19_-1/1', 's19_df')

it works perfectly fine since the iso_forest column has both -1s and +1s

But for a df as follows:

BacksGas_Flow_sccm  ContextID   StepID  Time_Elapsed    iso_forest
61.81640625 7289972 19  40.503  1
62.59765625 7289972 19  41.503  1
63.671875   7289972 19  42.503  1
65.625  7289972 19  43.503  1
61.81640625 7289973 19  40.448  1
62.59765625 7289973 19  41.448  1
63.671875   7289973 19  42.448  1
65.625  7289973 19  43.448  1

where there are no -1s and only +1s are present in the iso_forest column, running the function throws a key error: -1 since there are no -1s.

So, what I would like to do is, if there are no -1s, then before the

dictionary[new_df]['-1/1'] =  (dictionary[new_df][-1])/(dictionary[new_df][1])

step, a column named -1 has to be created and it must be filled with zeros.

Similarly, there might be cases where only -1s are present and +1s are not there. In such a situation, a column of +1s must be created and filled with zeros.

Can someone help me with the logic here, as to how can I achieve this?

Upvotes: 1

Views: 43

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150785

You can use unstack and reindex:

(df.groupby('ContextID').iso_forest
   .value_counts()
   .unstack(level=0, fill_value=0)
   .reindex([-1,1],fill_value=0).T
)

Output:

iso_forest  -1   1
ContextID         
7289972      0   4
7289973      0   4

Upvotes: 2

Related Questions