Reputation: 1722
I have a pandas dataframe flsa
:
flsa[:10]
auc topics ww top-n fold
0 0.668729 11 entropy 10 1
1 0.609736 11 entropy 10 2
2 0.654445 11 entropy 10 3
3 0.612886 11 entropy 10 4
4 0.596460 11 entropy 10 5
5 0.654208 11 entropy 15 1
6 0.620610 11 entropy 15 2
7 0.637275 11 entropy 15 3
8 0.603725 11 entropy 15 4
9 0.596100 11 entropy 15 5
Now, I group them as follows:
mean_flsa_auc = flsa.groupby(['topics','ww']).mean('auc').drop('fold', axis = 1).drop('top-n', axis=1)
Resulting in:
mean_flsa_auc[:10]
auc
topics ww
3 entropy 0.610580
idf 0.593962
normal 0.623830
probidf 0.598362
5 entropy 0.623360
idf 0.619105
normal 0.644371
probidf 0.617489
7 entropy 0.631131
idf 0.624773
Now, I would like to make the following line chart: x-axis: topics, y-axis: auc, 4 lines: entropy, idf, normal, probidf.
However, whenever I want to select all 'entropy' values:
mean_flsa_auc[mean_flsa_auc['ww'] == 'entropy']
I get the following error:
Traceback (most recent call last):
File "C:\Users\20200016\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 2895, in get_loc
return self._engine.get_loc(casted_key)
File "pandas\_libs\index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'ww'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<ipython-input-490-0dacb2bb9cf3>", line 1, in <module>
mean_flsa_auc[mean_flsa_auc['ww'] == 'entropy']
File "C:\Users\20200016\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2902, in __getitem__
indexer = self.columns.get_loc(key)
File "C:\Users\20200016\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 2897, in get_loc
raise KeyError(key) from err
KeyError: 'ww'
I suspect that I treat mean_flsa_auc
as a dataframe
object while it is a groupby
object now. But I don't know how to change my code so that I would get a list of all the entropy
values in the groupby object.
Who can help me with this?
Upvotes: 0
Views: 84
Reputation: 23217
You can use as_index=False
in your groupby()
statement to preserve the columns of groupby fields, as follows:
mean_flsa_auc = flsa.groupby(['topics','ww'], as_index=False).mean('auc').drop('fold', axis = 1).drop('top-n', axis=1)
By default, groupby()
set the fields grouped-by as index and so you cannot access those fields as before like ordinary data columns. With parameter index=False
, these fields will not set as index and will be remained in data columns.
Alternatively, you can also do a .reset_index()
afterwards using your existing code to relocate the index fields back to the data columns, as follows:
mean_flsa_auc = mean_flsa_auc.reset_index()
Then, you can access the ww
column.
Upvotes: 1