Reputation: 1030
I have a pandas dataframe that has 2 columns. I want to loop through it's rows and based on a string from column 2 I would like to add a string in a newly created 3th column. I tried:
for i in df.index:
if df.ix[i]['Column2']==variable1:
df['Column3'] = variable2
elif df.ix[i]['Column2']==variable3:
df['Column3'] = variable4
print(df)
But the resulting dataframe has in column 3 only Variable2.
Any ideas how else I could do this?
Upvotes: 3
Views: 12252
Reputation: 4742
Firstly, there is no need to loop through each and every index, just use pandas built in boolean indexing. First line here, we gather all of the values in Column2
that are the same as variable1
and set the same row in Column3
to be variable2
df.ix[df.Column2==variable1, 'Column3'] = variable2
df.ix[df.Column2==variable3, 'Column3'] = variable4
A simple example would be
import pandas as pd
df = pd.DataFrame({'Animal':['dog', 'fish', 'fish', 'dog']})
print(df)
Animal
0 dog
1 fish
2 fish
3 dog
df.ix[df.Animal=='dog', 'Colour'] = 'brown'
df.ix[df.Animal=='fish', 'Colour'] = 'silver'
print(df)
Animal Colour
0 dog brown
1 fish silver
2 fish silver
3 dog brown
The above method can be build on very easily using multiple conditions like &
and |
to boolean index.
df = pd.DataFrame({'Animal':['dog', 'fish', 'fish', 'dog'], 'Age': [1, 3, 2, 10]})
print(df)
Age Animal
0 1 dog
1 3 fish
2 2 fish
3 10 dog
df.ix[(df.Animal=='dog') & (df.Age > 8), 'Colour'] = 'grey' # old dogs go grey
df.ix[(df.Animal=='dog') & (df.Age <= 8), 'Colour'] = 'brown'
df.ix[df.Animal=='fish', 'Colour'] = 'silver'
print(df)
Age Animal Colour
0 1 dog brown
1 3 fish silver
2 2 fish silver
3 10 dog grey
Upvotes: 2
Reputation: 5921
You can also try this (if you want to keep the for
loop you use) :
new_column = []
for i in df.index:
if df.ix[i]['Column2']==variable1:
new_column.append(variable2)
elif df.ix[i]['Column2']==variable3:
new_column.append(variable4)
else : #if both conditions not verified
new_column.append(other_variable)
df['Column3'] = new_column
Upvotes: 4
Reputation: 863531
I think you can use double numpy.where
, what is faster as loop:
df['Column3'] = np.where(df['Column2']==variable1, variable2,
np.where(df['Column2']==variable3, variable4))
And if need add variable if both conditions are False
:
df['Column3'] = np.where(df['Column2']==variable1, variable2,
np.where(df['Column2']==variable3, variable4, variable5))
Sample:
df = pd.DataFrame({'Column2':[1,2,4,3]})
print (df)
Column2
0 1
1 2
2 4
3 3
variable1 = 1
variable2 = 2
variable3 = 3
variable4 = 4
variable5 = 5
df['Column3'] = np.where(df['Column2']==variable1, variable2,
np.where(df['Column2']==variable3, variable4, variable5))
print (df)
Column2 Column3
0 1 2
1 2 5
2 4 5
3 3 4
Another solution, thanks Jon Clements
:
df['Column4'] = df.Column2.map({variable1: variable2, variable3:variable4}).fillna(variable5)
print (df)
Column2 Column3 Column4
0 1 2 2.0
1 2 5 5.0
2 4 5 5.0
3 3 4 4.0
Upvotes: 2