Reputation: 1297
I have a variable, 'ImageName
' which ranges from 0-1600. I want to create a new variable, 'LocationCode
', based on the value of 'ImageName
'.
If 'ImageName
' is less than 70, I want 'LocationCode
' to be 1. if 'ImageName
' is between 71 and 90, I want 'LocationCode
' to be 2. I have 13 different codes in all. I'm not sure how to write this in python pandas. Here's what I tried:
def spatLoc(ImageName):
if ImageName <=70:
LocationCode = 1
elif ImageName >70 and ImageName <=90:
LocationCode = 2
return LocationCode
df['test'] = df.apply(spatLoc(df['ImageName'])
but it returned an error. I'm clearly not defining things the right way but I can't figure out how to.
Upvotes: 2
Views: 3098
Reputation: 633
In Python, you use the dictionary lookup notation to find a field within a row. The field name is ImageName
. In the spatLoc()
function below, the parameter row is a dictionary containing the entire row, and you would find an individual column by using the field name as key to the dictionary.
def spatLoc(row):
if row['ImageName'] <=70:
LocationCode = 1
elif row['ImageName'] >70 and row['ImageName'] <=90:
LocationCode = 2
return LocationCode
df['test'] = df.apply(spatLoc, axis=1)
Upvotes: 0
Reputation: 394071
You can just use 2 boolean masks:
df.loc[df['ImageName'] <= 70, 'Test'] = 1
df.loc[(df['ImageName'] > 70) & (df['ImageName'] <= 90), 'Test'] = 2
By using the masks you only set the value where the boolean condition is met, for the second mask you need to use the &
operator to and
the conditions and enclose the conditions in parentheses due to operator precedence
Actually I think it would be better to define your bin values and call cut
, example:
In [20]:
df = pd.DataFrame({'ImageName': np.random.randint(0, 100, 20)})
df
Out[20]:
ImageName
0 48
1 78
2 5
3 4
4 9
5 81
6 49
7 11
8 57
9 17
10 92
11 30
12 74
13 62
14 83
15 21
16 97
17 11
18 34
19 78
In [22]:
df['group'] = pd.cut(df['ImageName'], range(0, 105, 10), right=False)
df
Out[22]:
ImageName group
0 48 [40, 50)
1 78 [70, 80)
2 5 [0, 10)
3 4 [0, 10)
4 9 [0, 10)
5 81 [80, 90)
6 49 [40, 50)
7 11 [10, 20)
8 57 [50, 60)
9 17 [10, 20)
10 92 [90, 100)
11 30 [30, 40)
12 74 [70, 80)
13 62 [60, 70)
14 83 [80, 90)
15 21 [20, 30)
16 97 [90, 100)
17 11 [10, 20)
18 34 [30, 40)
19 78 [70, 80)
Here the bin values were generated using range
but you could pass your list of bin values yourself, once you have the bin values you can define a lookup dict:
In [32]:
d = dict(zip(df['group'].unique(), range(len(df['group'].unique()))))
d
Out[32]:
{'[0, 10)': 2,
'[10, 20)': 4,
'[20, 30)': 9,
'[30, 40)': 7,
'[40, 50)': 0,
'[50, 60)': 5,
'[60, 70)': 8,
'[70, 80)': 1,
'[80, 90)': 3,
'[90, 100)': 6}
You can now call map
and add your new column:
In [33]:
df['test'] = df['group'].map(d)
df
Out[33]:
ImageName group test
0 48 [40, 50) 0
1 78 [70, 80) 1
2 5 [0, 10) 2
3 4 [0, 10) 2
4 9 [0, 10) 2
5 81 [80, 90) 3
6 49 [40, 50) 0
7 11 [10, 20) 4
8 57 [50, 60) 5
9 17 [10, 20) 4
10 92 [90, 100) 6
11 30 [30, 40) 7
12 74 [70, 80) 1
13 62 [60, 70) 8
14 83 [80, 90) 3
15 21 [20, 30) 9
16 97 [90, 100) 6
17 11 [10, 20) 4
18 34 [30, 40) 7
19 78 [70, 80) 1
The above can be modified to suit your needs but it's just to demonstrate an approach which should be fast and without the need to iterate over your df.
Upvotes: 3