Reputation: 11
I checked similar questions posted about slicing DFs in Python but they didn't explain the inconsistency I'm seeing in my exercise.
The code works with the known diamonds data frame. Top lines of the data frame are:
carat cut color clarity depth table price x y z
0 0.23 Ideal E SI2 61.5 55.0 326 3.95 3.98 2.43
1 0.21 Premium E SI1 59.8 61.0 326 3.89 3.84 2.31
2 0.23 Good E VS1 56.9 65.0 327 4.05 4.07 2.31
I have to create a slicing function which takes 4 arguments: DataFrame 'df', a column of that DataFrame 'col', the label of another column 'label' and two values 'val1' and 'val2'. The function will take the frame and output the entries of the column indicated by the 'label' argument for which the rows of the column 'col' are greater than the number 'val1' and less than 'val2'.
The following stand-alone piece of code gives me the correct answer:
diamonds.loc[(diamonds.carat > 1.1) & (diamonds.carat < 1.4),['price']]
and I get the price from the rows where the carat value is between 1.1 and 1.4.
However, when I try to use this syntax in a function, it doesn't work and I get an error.
Function:
def slice2(df,col,output_label,val1,val2):
res = df.loc[(col > val1) & (col < val2), ['output_label']]
return res
Function call:
slice2(diamonds,diamonds.carat,'price',1.1,1.4)
Error:
"None of [['output_label']] are in the [columns]"
Full traceback message:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-64-adc582faf6cc> in <module>()
----> 1 exercise2(test_df,test_df.carat,'price',1.1,1.4)
<ipython-input-63-556b71ba172d> in exercise2(df, col, output_label, val1, val2)
1 def exercise2(df,col,output_label,val1,val2):
----> 2 res = df.loc[(col > val1) & (col < val2), ['output_label']]
3 return res
/Users/jojo/Library/Enthought/Canopy/edm/envs/User/lib/python3.5/site-packages/pandas/core/indexing.py in __getitem__(self, key)
1323 except (KeyError, IndexError):
1324 pass
-> 1325 return self._getitem_tuple(key)
1326 else:
1327 key = com._apply_if_callable(key, self.obj)
/Users/jojo/Library/Enthought/Canopy/edm/envs/User/lib/python3.5/site-packages/pandas/core/indexing.py in _getitem_tuple(self, tup)
839
840 # no multi-index, so validate all of the indexers
--> 841 self._has_valid_tuple(tup)
842
843 # ugly hack for GH #836
/Users/jojo/Library/Enthought/Canopy/edm/envs/User/lib/python3.5/site-packages/pandas/core/indexing.py in _has_valid_tuple(self, key)
187 if i >= self.obj.ndim:
188 raise IndexingError('Too many indexers')
--> 189 if not self._has_valid_type(k, i):
190 raise ValueError("Location based indexing can only have [%s] "
191 "types" % self._valid_types)
/Users/jojo/Library/Enthought/Canopy/edm/envs/User/lib/python3.5/site-packages/pandas/core/indexing.py in _has_valid_type(self, key, axis)
1416
1417 raise KeyError("None of [%s] are in the [%s]" %
-> 1418 (key, self.obj._get_axis_name(axis)))
1419
1420 return True
KeyError: "None of [['output_label']] are in the [columns]"
I'm not very advanced in Python and after looking at this code for a while I haven't been able to figure out what the problem is. Maybe I'm blind to something obvious here and would appreciate any pointed on how to get the function to work or how to redo it so that it gives the same result as the single line code.
Thanks
Upvotes: 0
Views: 1006
Reputation: 602
In your function
def slice2(df,col,output_label,val1,val2):
res = df.loc[(col > val1) & (col < val2), ['output_label']]
return res
you are searching for the column with name 'output_label' instead of using your parameter (you are assigning its value directly instead of using your value!)
This should work:
def slice2(df,col,output_label,val1,val2):
res = df.loc[(col > val1) & (col < val2), [output_label]] # notice that there are not quotes
return res
Upvotes: 4