Reputation: 4564
I have a data frame with variable column. The column has different variables and some have common sizes and other have unique sizes. I want to create new column based on the variable column
df =
variable
0 A1
1 A2
2 B1
3 B2
4 C
5 A1
6 D
7 A1
8 A2
9 B1
#I want to create a new column `size` indicating the size of the variable.
# A1, A2 = 20
# B1, B2 = 10
# C = 5, D = 2
My approach1
df['size'] = ""
df.loc[df['variable'].isin([A1,A2])==True,'size']=20
df.loc[df['variable'].isin([B1,B2])==True,'size']=10
df.loc[df['variable'].isin([C])==True,'size']=5
df.loc[df['variable'].isin([D])==True,'size']=2
My approach2
size_list = [['A1',20],['A2',20],['B1',10],['B2',10],['C',5],['D',2]]
for itm in size_list:
df.loc[df['variable'].isin([itm[0])==True,'size']=itm[1]
The first approach is 4 lines and vectorized approach. The second approach is just two lines but a for
loop. Which approach should I consider? Is there a much better approach?
Upvotes: 1
Views: 130
Reputation: 862511
Use Series.map
with dictionary created from your list for mapping:
size_list = [['A1',20],['A2',20],['B1',10],['B2',10],['C',5],['D',2]]
df['size'] = df['variable'].map(dict(size_list))
print (df)
variable size
0 A1 20
1 A2 20
2 B1 10
3 B2 10
4 C 5
5 A1 20
6 D 2
7 A1 20
8 A2 20
9 B1 10
Upvotes: 1