Gaurav Srivastava
Gaurav Srivastava

Reputation: 545

Pandas: Generate a Dataframe column which has values depending on another column of a dataframe

I am trying to generate a pandas Dataframe where a column will have numerical values based on the values of a column in another dataframe. Below is an example: I want to generate another dataframe based on a column of dataframe df_

ipdb> df_ = pd.DataFrame({'c1':[False, True, False, True]})
ipdb> df_
      c1
0  False
1   True
2  False
3   True

Using df_ another dataframe df1 is generated with columns as below.

ipdb> df1
   col1  col2
0     0   NaN
1     1   0
2     2   NaN
3     3   1

Here, 'col1' has normal index values and 'c1' has NaN in the rows where there was False in df_ and sequentially incrementing values where 'c1' is True.

To generate this dataframe, below is what I have tried.

ipdb> df_[df_['c1']==True].reset_index().reset_index()
   level_0  index    c1
0        0      1  True
1        1      3  True

However, I feel there should be a better way to generate a dataframe with the two columns as in df1.

Upvotes: 1

Views: 93

Answers (2)

jezrael
jezrael

Reputation: 862511

I think you need cumsum and subtract 1 for start counting from 0:

df_ = pd.DataFrame({'c1':[False, True, False, True]})

df_['col2'] = df_.loc[df_['c1'], 'c1'].cumsum().sub(1)
print (df_)
      c1  col2
0  False   NaN
1   True   0.0
2  False   NaN
3   True   1.0

Another solution is count occurencies of True values by sum with numpy.arange and assign back to filtered DataFrame:

df_.loc[df_['c1'],'col2']= np.arange(df_['c1'].sum())
print (df_)
      c1  col2
0  False   NaN
1   True   0.0
2  False   NaN
3   True   1.0

Details:

print (df_['c1'].sum())
2

print (np.arange(df_['c1'].sum()))
[0 1]

Upvotes: 2

Mohamed Thasin ah
Mohamed Thasin ah

Reputation: 11192

another way to solve this,

df.loc[df['c1'],'col2']=range(len(df[df['c1']]))

Output:

      c1  col2
0  False   NaN
1   True   0.0
2  False   NaN
3   True   1.0

Upvotes: 2

Related Questions