Reputation: 1353
I'm using pandas
with some data like the following,
User Code Group Task Type Time
0 u00 G00 1D 3.378195
1 u00 G00 1D 3.032764
2 u00 G00 1D 3.391991
3 u00 G00 2D 4.035652
4 u00 G00 2D 2.991456
5 u00 G00 2D 3.972600
6 u01 G01 2D 3.236271
7 u01 G01 2D 3.313933
8 u01 G01 2D 3.053321
9 u01 G01 1D 3.439581
10 u01 G01 1D 3.526108
11 u01 G01 1D 3.392685
...
What I'm doing now is grouping the data obtaining the average of the time for 2 task type like this,
mean_data = data.groupby(['User Code','Group','Task Type']).mean()
And I obtain a dataframe like this
Time
User Code Group Task Type
u00 G00 1D 3.727686
2D 4.193184
u01 G01 1D 3.507185
2D 3.462133
u02 G01 1D 2.111048
2D 1.582493
...
Here I'm doubting if I'm doing correctly because I don't understand why Time
appears in a row and the other fields i other row. But know I want to plot this results using matplotlib
and observe the difference between groups and tasks, to understand if the times depends of the group or task. But I really don't know how to do it...
I know I'm not writing some try, but this is because I really don't know how to afront it. For example if I want to do a double barplot, that the x-axis is the user, the y-axis the time and one barplot is task 1D and the other task 2D, how I take this from the dataframe?
Thank you very much!
Upvotes: 0
Views: 27
Reputation: 10545
Your groupby
operation works fine. The reason for Time
being one row above the other labels is that the groupby
has created a structure called a MultiIndex. I suppose the output is formatted in this way to make it easier to distinguish the index columns from the data value columns.
For plotting this, I recommend using seaborn. Then you can pass your original data frame to the plotting function and specify by which variables to group in which way. This part of the official tutorial would be a good place to start.
Upvotes: 1