Reputation: 545
I am trying to generate a pandas Dataframe where a column will have numerical values based on the values of a column in another dataframe. Below is an example: I want to generate another dataframe based on a column of dataframe df_
ipdb> df_ = pd.DataFrame({'c1':[False, True, False, True]})
ipdb> df_
c1
0 False
1 True
2 False
3 True
Using df_ another dataframe df1 is generated with columns as below.
ipdb> df1
col1 col2
0 0 NaN
1 1 0
2 2 NaN
3 3 1
Here, 'col1' has normal index values and 'c1' has NaN in the rows where there was False in df_ and sequentially incrementing values where 'c1' is True.
To generate this dataframe, below is what I have tried.
ipdb> df_[df_['c1']==True].reset_index().reset_index()
level_0 index c1
0 0 1 True
1 1 3 True
However, I feel there should be a better way to generate a dataframe with the two columns as in df1.
Upvotes: 1
Views: 93
Reputation: 862511
I think you need cumsum
and subtract 1
for start counting from 0
:
df_ = pd.DataFrame({'c1':[False, True, False, True]})
df_['col2'] = df_.loc[df_['c1'], 'c1'].cumsum().sub(1)
print (df_)
c1 col2
0 False NaN
1 True 0.0
2 False NaN
3 True 1.0
Another solution is count occurencies of True
values by sum
with numpy.arange
and assign back to filtered DataFrame
:
df_.loc[df_['c1'],'col2']= np.arange(df_['c1'].sum())
print (df_)
c1 col2
0 False NaN
1 True 0.0
2 False NaN
3 True 1.0
Details:
print (df_['c1'].sum())
2
print (np.arange(df_['c1'].sum()))
[0 1]
Upvotes: 2
Reputation: 11192
another way to solve this,
df.loc[df['c1'],'col2']=range(len(df[df['c1']]))
Output:
c1 col2
0 False NaN
1 True 0.0
2 False NaN
3 True 1.0
Upvotes: 2