new python pandas dataframe column based on value of variable, using function

Question

I have a variable, 'ImageName' which ranges from 0-1600. I want to create a new variable, 'LocationCode', based on the value of 'ImageName'.

If 'ImageName' is less than 70, I want 'LocationCode' to be 1. if 'ImageName' is between 71 and 90, I want 'LocationCode' to be 2. I have 13 different codes in all. I'm not sure how to write this in python pandas. Here's what I tried:

def spatLoc(ImageName):
    if ImageName <=70:
        LocationCode = 1
    elif ImageName >70 and ImageName <=90:
        LocationCode = 2
   return LocationCode

df['test'] = df.apply(spatLoc(df['ImageName'])

but it returned an error. I'm clearly not defining things the right way but I can't figure out how to.

EdChum · Accepted Answer

You can just use 2 boolean masks:

df.loc[df['ImageName'] <= 70, 'Test'] = 1
df.loc[(df['ImageName'] > 70) & (df['ImageName'] <= 90), 'Test'] = 2

By using the masks you only set the value where the boolean condition is met, for the second mask you need to use the & operator to and the conditions and enclose the conditions in parentheses due to operator precedence

Actually I think it would be better to define your bin values and call cut, example:

In [20]:    
df = pd.DataFrame({'ImageName': np.random.randint(0, 100, 20)})
df

Out[20]:
    ImageName
0          48
1          78
2           5
3           4
4           9
5          81
6          49
7          11
8          57
9          17
10         92
11         30
12         74
13         62
14         83
15         21
16         97
17         11
18         34
19         78

In [22]:    
df['group'] = pd.cut(df['ImageName'], range(0, 105, 10), right=False)
df

Out[22]:
    ImageName      group
0          48   [40, 50)
1          78   [70, 80)
2           5    [0, 10)
3           4    [0, 10)
4           9    [0, 10)
5          81   [80, 90)
6          49   [40, 50)
7          11   [10, 20)
8          57   [50, 60)
9          17   [10, 20)
10         92  [90, 100)
11         30   [30, 40)
12         74   [70, 80)
13         62   [60, 70)
14         83   [80, 90)
15         21   [20, 30)
16         97  [90, 100)
17         11   [10, 20)
18         34   [30, 40)
19         78   [70, 80)

Here the bin values were generated using range but you could pass your list of bin values yourself, once you have the bin values you can define a lookup dict:

In [32]:    
d = dict(zip(df['group'].unique(), range(len(df['group'].unique()))))
d

Out[32]:
{'[0, 10)': 2,
 '[10, 20)': 4,
 '[20, 30)': 9,
 '[30, 40)': 7,
 '[40, 50)': 0,
 '[50, 60)': 5,
 '[60, 70)': 8,
 '[70, 80)': 1,
 '[80, 90)': 3,
 '[90, 100)': 6}

You can now call map and add your new column:

In [33]:    
df['test'] = df['group'].map(d)
df

Out[33]:
    ImageName      group  test
0          48   [40, 50)     0
1          78   [70, 80)     1
2           5    [0, 10)     2
3           4    [0, 10)     2
4           9    [0, 10)     2
5          81   [80, 90)     3
6          49   [40, 50)     0
7          11   [10, 20)     4
8          57   [50, 60)     5
9          17   [10, 20)     4
10         92  [90, 100)     6
11         30   [30, 40)     7
12         74   [70, 80)     1
13         62   [60, 70)     8
14         83   [80, 90)     3
15         21   [20, 30)     9
16         97  [90, 100)     6
17         11   [10, 20)     4
18         34   [30, 40)     7
19         78   [70, 80)     1

The above can be modified to suit your needs but it's just to demonstrate an approach which should be fast and without the need to iterate over your df.

new python pandas dataframe column based on value of variable, using function

Answers (2)

Related Questions