Reputation: 51
I have dataframe as below
df = pd.DataFrame([[1,5,'Dog'],[2,6,'Dog'],[3,7,'Cat'],[4,8,'Cat']],columns=['A','B','Type'])
Index | A | B | Type |
---|---|---|---|
0 | 1 | 5 | Dog |
1 | 2 | 6 | Dog |
2 | 3 | 7 | Cat |
3 | 4 | 8 | Cat |
Based on the 'Type' column value, I need to apply its own function(for example for Dog rows, call its Dog function and get the value populated in the C & D; likewise for Cat type, call its cat function and create C & D column) and create two new columns C and D returned from these functions.
Finally my dataframe should look like the below
Index | A | B | Type | C | D |
---|---|---|---|---|---|
0 | 1 | 5 | Dog | Dog1 Value | Dog2 Value |
1 | 2 | 6 | Dog | Dog1 Value | Dog2 Value |
2 | 3 | 7 | Cat | Cat1 Value | Cat2 Value |
3 | 4 | 8 | Cat | Cat1 Value | Cat2 Value |
Column C and D are values returned from the functions. For examples here I have given like below.
The problem I face here is -
For each type of 'Type' column value, I am filering the rows and calling it's own function and getting the C and D column but when I merge it back into the original dataframe with left_index=True and Right_index =True, it is creating Column_X and Column_Y for all the columns and this is creating problem when I iterate for the next 'Cat' rows. Please advice how shall I approach this problem
Code
def ext_fun(x1,x2,i):
if i=='Dog':
#Do some calc to find c and d value and return back
return ['c','d']
if i=='Cat':
#do some calc to find c and d value and return back
return ['c','d']
for i in df['Type'].unique():
df1 = df[df.Type==i]
df1[['C','D']] = df1.apply(lambda x: ext_fun(x['A'],x['B'],i),result_type='expand',axis=1)
df = pd.merge(df,df1,left_index = True,right_index=True)
Note: I have 10 to 15 types in the column 'Type' with hundreds of records in each type. The values for col C and D are dynamic and require a function. So function call is required based on the Type column value.
Upvotes: 1
Views: 102
Reputation: 792
You don't have to split and then re-merge the dataframes, you can use .loc
:
df.loc[df['Type'] == 'Dog', 'C'] = 'Dog1 Value'
df.loc[df['Type'] == 'Cat', 'C'] = 'Cat1 Value'
df.loc[df['Type'] == 'Dog', 'D'] = 'Dog2 Value'
df.loc[df['Type'] == 'Cat', 'D'] = 'Cat2 Value'
Sorry for the values, I don't know which value you will use so I fill it with yours
Upvotes: 1