Reputation: 11
I need to insert value into a column based on row index of a pandas dataframe.
import pandas as pd
df=pd.DataFrame(np.random.randint(0,100,size=(11, 4)), columns=list('ABCD'))
df['ticker']='na'
df
Sample DataFrame In the above sample dataframe, the ticker column for first 25% of the total number of records must have value '$" the next 25% of the records must have value "$$" and so on.
I tried to get the length of the dataframe and calculate 25,50,75 percent on it and then access one row at a time and assign value to "ticker" based on row index.
total_row_count=len(df)
row_25 = int(total_row_count * .25)
row_50 = int(total_row_count * .5)
row_75=int(total_row_count*.75)
if ((row.index >=0) and (row.index<=row_25)):
return"$"
elif ((row.index > row_25) and (row.index<=row_50)):
return"$$"
elif ((row.index > row_50) and (row.index<=row_75)):
return"$$$"
elif (row.index > row_75):
return"$$$$"
But I'm not able to get the row index. Please let me know if there is a different way to assign these values
Upvotes: 0
Views: 6463
Reputation: 323226
I think cut
can solve this problem
df['ticker']=pd.cut(np.arange(len(df))/len(df), [-np.inf,0.25,0.5,0.75,1], labels=["$","$$",'$$$','$$$$'],right=True)
df
Out[35]:
A B C D ticker
0 63 51 19 33 $
1 12 80 57 1 $
2 53 27 62 26 $
3 97 43 31 80 $$
4 91 22 92 11 $$
5 39 70 82 26 $$
6 32 62 17 75 $$$
7 5 59 79 72 $$$
8 75 4 47 4 $$$
9 43 5 45 66 $$$$
10 29 9 74 94 $$$$
Upvotes: 1
Reputation: 51335
I like to use np.select
for this kind of task, because I find the syntax intuitive and readable:
# Set up your conditions:
conds = [(df.index >= 0) & (df.index <= row_25),
(df.index > row_25) & (df.index<=row_50),
(df.index > row_50) & (df.index<=row_75),
(df.index > row_75)]
# Set up your target values (in the same order as your conditions)
choices = ['$', '$$', '$$$', '$$$$']
# Assign df['ticker']
df['ticker'] = np.select(conds, choices)
returns this:
>>> df
A B C D ticker
0 92 97 25 79 $
1 76 4 26 94 $
2 49 65 19 91 $
3 76 3 83 45 $$
4 83 16 0 16 $$
5 1 56 97 44 $$
6 78 17 18 86 $$$
7 55 56 83 91 $$$
8 76 16 52 33 $$$
9 55 35 80 95 $$$$
10 90 29 41 87 $$$$
Upvotes: 1
Reputation: 164623
This is one explicit solution using .loc
accessor.
import pandas as pd
df = pd.DataFrame(np.random.randint(0,100,size=(11, 4)), columns=list('ABCD'))
n = len(df.index)
df['ticker'] = 'na'
df.loc[df.index <= n/4, 'ticker'] = '$'
df.loc[(n/4 < df.index) & (df.index <= n/2), 'ticker'] = '$$'
df.loc[(n/2 < df.index) & (df.index <= n*3/4), 'ticker'] = '$$$'
df.loc[df.index > n*3/4, 'ticker'] = '$$$$'
# A B C D ticker
# 0 47 64 7 46 $
# 1 53 55 75 3 $
# 2 93 95 28 47 $
# 3 35 88 16 7 $$
# 4 99 66 88 84 $$
# 5 75 2 72 90 $$
# 6 6 53 36 92 $$$
# 7 83 58 54 67 $$$
# 8 49 83 46 54 $$$
# 9 69 9 96 73 $$$$
# 10 84 42 11 83 $$$$
Upvotes: 0
Reputation: 433
You can set up a few np.where statements to handle this. Try something like the following:
import numpy as np
...
df['ticker'] = np.where(df.index < row_25, "$", df['ticker'])
df['ticker'] = np.where(row_25 <= df.index < row_50, "$$", df['ticker'])
df['ticker'] = np.where(row_50 <= df.index < row_75, "$$$", df['ticker'])
df['ticker'] = np.where(row_75 <= df.index, "$$$$", df['ticker'])
Upvotes: 0