Andrei Cozma
Andrei Cozma

Reputation: 1030

How to iterate over pandas dataframe and create new column

I have a pandas dataframe that has 2 columns. I want to loop through it's rows and based on a string from column 2 I would like to add a string in a newly created 3th column. I tried:

for i in df.index:
    if df.ix[i]['Column2']==variable1:
        df['Column3'] = variable2
    elif df.ix[i]['Column2']==variable3:
        df['Column3'] = variable4

print(df)

But the resulting dataframe has in column 3 only Variable2.

Any ideas how else I could do this?

Upvotes: 3

Views: 12252

Answers (3)

Little Bobby Tables
Little Bobby Tables

Reputation: 4742

Firstly, there is no need to loop through each and every index, just use pandas built in boolean indexing. First line here, we gather all of the values in Column2 that are the same as variable1 and set the same row in Column3 to be variable2

df.ix[df.Column2==variable1, 'Column3'] = variable2
df.ix[df.Column2==variable3, 'Column3'] = variable4

A simple example would be

import pandas as pd

df = pd.DataFrame({'Animal':['dog', 'fish', 'fish', 'dog']})
print(df)

    Animal
0   dog
1   fish
2   fish
3   dog

df.ix[df.Animal=='dog', 'Colour'] = 'brown'
df.ix[df.Animal=='fish', 'Colour'] = 'silver'
print(df)

    Animal  Colour
0   dog     brown
1   fish    silver
2   fish    silver
3   dog     brown

The above method can be build on very easily using multiple conditions like & and | to boolean index.

df = pd.DataFrame({'Animal':['dog', 'fish', 'fish', 'dog'], 'Age': [1, 3, 2, 10]})
print(df)

   Age Animal
0    1    dog
1    3   fish
2    2   fish
3   10    dog

df.ix[(df.Animal=='dog') & (df.Age > 8), 'Colour'] = 'grey' # old dogs go grey
df.ix[(df.Animal=='dog') & (df.Age <= 8), 'Colour'] = 'brown'
df.ix[df.Animal=='fish', 'Colour'] = 'silver'
print(df)

   Age Animal  Colour
0    1    dog   brown
1    3   fish  silver
2    2   fish  silver
3   10    dog    grey

Upvotes: 2

MMF
MMF

Reputation: 5921

You can also try this (if you want to keep the for loop you use) :

new_column = []

for i in df.index:
    if df.ix[i]['Column2']==variable1:
        new_column.append(variable2)
    elif df.ix[i]['Column2']==variable3:
        new_column.append(variable4)
    else : #if both conditions not verified
        new_column.append(other_variable)

df['Column3'] = new_column

Upvotes: 4

jezrael
jezrael

Reputation: 863531

I think you can use double numpy.where, what is faster as loop:

df['Column3'] = np.where(df['Column2']==variable1, variable2, 
                np.where(df['Column2']==variable3, variable4))

And if need add variable if both conditions are False:

df['Column3'] = np.where(df['Column2']==variable1, variable2, 
                np.where(df['Column2']==variable3, variable4, variable5))

Sample:

df = pd.DataFrame({'Column2':[1,2,4,3]})
print (df)
   Column2
0        1
1        2
2        4
3        3

variable1 = 1
variable2 = 2
variable3 = 3
variable4 = 4
variable5 = 5

df['Column3'] = np.where(df['Column2']==variable1, variable2, 
                np.where(df['Column2']==variable3, variable4, variable5))

print (df)
   Column2  Column3
0        1        2
1        2        5
2        4        5
3        3        4

Another solution, thanks Jon Clements:

df['Column4'] = df.Column2.map({variable1: variable2, variable3:variable4}).fillna(variable5)
print (df)
   Column2  Column3  Column4
0        1        2      2.0
1        2        5      5.0
2        4        5      5.0
3        3        4      4.0

Upvotes: 2

Related Questions