eNTROPY
eNTROPY

Reputation: 11

Python dataframe slicing doesn't work in a function but works stand-alone

I checked similar questions posted about slicing DFs in Python but they didn't explain the inconsistency I'm seeing in my exercise.

The code works with the known diamonds data frame. Top lines of the data frame are:

     carat        cut color clarity  depth  table  price     x     y     z
0     0.23      Ideal     E     SI2   61.5   55.0    326  3.95  3.98  2.43
1     0.21    Premium     E     SI1   59.8   61.0    326  3.89  3.84  2.31
2     0.23       Good     E     VS1   56.9   65.0    327  4.05  4.07  2.31

I have to create a slicing function which takes 4 arguments: DataFrame 'df', a column of that DataFrame 'col', the label of another column 'label' and two values 'val1' and 'val2'. The function will take the frame and output the entries of the column indicated by the 'label' argument for which the rows of the column 'col' are greater than the number 'val1' and less than 'val2'.

The following stand-alone piece of code gives me the correct answer:

diamonds.loc[(diamonds.carat > 1.1) & (diamonds.carat < 1.4),['price']]

and I get the price from the rows where the carat value is between 1.1 and 1.4.

However, when I try to use this syntax in a function, it doesn't work and I get an error.

Function:

def slice2(df,col,output_label,val1,val2):
    res = df.loc[(col > val1) & (col < val2), ['output_label']]
    return res

Function call:

slice2(diamonds,diamonds.carat,'price',1.1,1.4)

Error:

"None of [['output_label']] are in the [columns]" 

Full traceback message:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-64-adc582faf6cc> in <module>()
----> 1 exercise2(test_df,test_df.carat,'price',1.1,1.4)

<ipython-input-63-556b71ba172d> in exercise2(df, col, output_label, val1, val2)
      1 def exercise2(df,col,output_label,val1,val2):
----> 2     res = df.loc[(col > val1) & (col < val2), ['output_label']]
      3     return res
/Users/jojo/Library/Enthought/Canopy/edm/envs/User/lib/python3.5/site-packages/pandas/core/indexing.py in __getitem__(self, key)
   1323             except (KeyError, IndexError):
   1324                 pass
-> 1325             return self._getitem_tuple(key)
   1326         else:
   1327             key = com._apply_if_callable(key, self.obj)
/Users/jojo/Library/Enthought/Canopy/edm/envs/User/lib/python3.5/site-packages/pandas/core/indexing.py in _getitem_tuple(self, tup)
    839 
    840         # no multi-index, so validate all of the indexers
--> 841         self._has_valid_tuple(tup)
    842 
    843         # ugly hack for GH #836
/Users/jojo/Library/Enthought/Canopy/edm/envs/User/lib/python3.5/site-packages/pandas/core/indexing.py in _has_valid_tuple(self, key)
    187             if i >= self.obj.ndim:
    188                 raise IndexingError('Too many indexers')
--> 189             if not self._has_valid_type(k, i):
    190                 raise ValueError("Location based indexing can only have [%s] "
    191                                  "types" % self._valid_types)
/Users/jojo/Library/Enthought/Canopy/edm/envs/User/lib/python3.5/site-packages/pandas/core/indexing.py in _has_valid_type(self, key, axis)
   1416 
   1417                 raise KeyError("None of [%s] are in the [%s]" %
-> 1418                                (key, self.obj._get_axis_name(axis)))
   1419 
   1420             return True
KeyError: "None of [['output_label']] are in the [columns]" 

I'm not very advanced in Python and after looking at this code for a while I haven't been able to figure out what the problem is. Maybe I'm blind to something obvious here and would appreciate any pointed on how to get the function to work or how to redo it so that it gives the same result as the single line code.

Thanks

Upvotes: 0

Views: 1006

Answers (1)

jcf
jcf

Reputation: 602

In your function

def slice2(df,col,output_label,val1,val2):
    res = df.loc[(col > val1) & (col < val2), ['output_label']]
    return res

you are searching for the column with name 'output_label' instead of using your parameter (you are assigning its value directly instead of using your value!)

This should work:

def slice2(df,col,output_label,val1,val2):
    res = df.loc[(col > val1) & (col < val2), [output_label]] # notice that there are not quotes
    return res

Upvotes: 4

Related Questions