Plot data from dataframe to understand much better it - Pandas, matplotlib

Question

I'm using pandas with some data like the following,

    User Code   Group   Task Type   Time
0   u00         G00     1D          3.378195
1   u00         G00     1D          3.032764
2   u00         G00     1D          3.391991
3   u00         G00     2D          4.035652
4   u00         G00     2D          2.991456
5   u00         G00     2D          3.972600
6   u01         G01     2D          3.236271
7   u01         G01     2D          3.313933
8   u01         G01     2D          3.053321
9   u01         G01     1D          3.439581
10  u01         G01     1D          3.526108
11  u01         G01     1D          3.392685
...

What I'm doing now is grouping the data obtaining the average of the time for 2 task type like this,

mean_data = data.groupby(['User Code','Group','Task Type']).mean()

And I obtain a dataframe like this

                                Time
User Code   Group   Task Type   
u00         G00     1D          3.727686
                    2D          4.193184
u01         G01     1D          3.507185
                    2D          3.462133
u02         G01     1D          2.111048
                    2D          1.582493
...

Here I'm doubting if I'm doing correctly because I don't understand why Time appears in a row and the other fields i other row. But know I want to plot this results using matplotlib and observe the difference between groups and tasks, to understand if the times depends of the group or task. But I really don't know how to do it...

I know I'm not writing some try, but this is because I really don't know how to afront it. For example if I want to do a double barplot, that the x-axis is the user, the y-axis the time and one barplot is task 1D and the other task 2D, how I take this from the dataframe?

Thank you very much!

Arne · Accepted Answer

Your groupby operation works fine. The reason for Time being one row above the other labels is that the groupby has created a structure called a MultiIndex. I suppose the output is formatted in this way to make it easier to distinguish the index columns from the data value columns.

For plotting this, I recommend using seaborn. Then you can pass your original data frame to the plotting function and specify by which variables to group in which way. This part of the official tutorial would be a good place to start.

Plot data from dataframe to understand much better it - Pandas, matplotlib

Answers (1)

Related Questions