Select columns that a Pandas dataframe was grouped by

Question

I have a pandas dataframe flsa:

flsa[:10]

        auc  topics       ww  top-n  fold
0  0.668729      11  entropy     10     1
1  0.609736      11  entropy     10     2
2  0.654445      11  entropy     10     3
3  0.612886      11  entropy     10     4
4  0.596460      11  entropy     10     5
5  0.654208      11  entropy     15     1
6  0.620610      11  entropy     15     2
7  0.637275      11  entropy     15     3
8  0.603725      11  entropy     15     4
9  0.596100      11  entropy     15     5

Now, I group them as follows:

mean_flsa_auc = flsa.groupby(['topics','ww']).mean('auc').drop('fold', axis =  1).drop('top-n', axis=1)

Resulting in:

mean_flsa_auc[:10]

                     auc
topics ww               
3      entropy  0.610580
       idf      0.593962
       normal   0.623830
       probidf  0.598362
5      entropy  0.623360
       idf      0.619105
       normal   0.644371
       probidf  0.617489
7      entropy  0.631131
       idf      0.624773

Now, I would like to make the following line chart: x-axis: topics, y-axis: auc, 4 lines: entropy, idf, normal, probidf.

However, whenever I want to select all 'entropy' values:

mean_flsa_auc[mean_flsa_auc['ww'] == 'entropy']

I get the following error:

Traceback (most recent call last):

  File "C:\Users\20200016\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 2895, in get_loc
    return self._engine.get_loc(casted_key)

  File "pandas\_libs\index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc

  File "pandas\_libs\index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc

  File "pandas\_libs\hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item

  File "pandas\_libs\hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item

KeyError: 'ww'


    The above exception was the direct cause of the following exception:
    
    Traceback (most recent call last):
    
      File "", line 1, in 
        mean_flsa_auc[mean_flsa_auc['ww'] == 'entropy']
    
      File "C:\Users\20200016\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2902, in __getitem__
        indexer = self.columns.get_loc(key)
    
      File "C:\Users\20200016\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 2897, in get_loc
        raise KeyError(key) from err
    
    KeyError: 'ww'

I suspect that I treat mean_flsa_auc as a dataframe object while it is a groupby object now. But I don't know how to change my code so that I would get a list of all the entropy values in the groupby object.

Who can help me with this?

SeaBean · Accepted Answer

You can use as_index=False in your groupby() statement to preserve the columns of groupby fields, as follows:

mean_flsa_auc = flsa.groupby(['topics','ww'], as_index=False).mean('auc').drop('fold', axis =  1).drop('top-n', axis=1)

By default, groupby() set the fields grouped-by as index and so you cannot access those fields as before like ordinary data columns. With parameter index=False, these fields will not set as index and will be remained in data columns.

Alternatively, you can also do a .reset_index() afterwards using your existing code to relocate the index fields back to the data columns, as follows:

mean_flsa_auc = mean_flsa_auc.reset_index()

Then, you can access the ww column.

Select columns that a Pandas dataframe was grouped by

Answers (1)

Related Questions