Sharki
Sharki

Reputation: 375

Boxplot Pandas data

DataFrame is as follows:

        ID1             ID2 
0   00:00:01.002    00:00:01.002
1   00:00:01.001    00:00:01.006
2   00:00:01.004    00:00:01.011
3   00:00:00.998    00:00:01.012
4       NaT         00:00:01.000
                ...
20      NaT         00:00:00.998

What I am trying to do is create a boxplot for each ID. There may or may not be multiple IDs depending on the dataset I provide. For right now I am trying to solve this for 2 datasets. If possible I would like a solution that has all the data on the same boxplot and then another with the data displayed on its own boxplot per ID.

I am very new to pandas (trying to learn it...) and am just getting frustrated at how long this is taking to figure out... Here is my code...

deltaTime = pd.DataFrame() #Create blank df
for x in range(0, len(totIDs)):
   ID = IDList[x]
   df = pd.DataFrame(data[ID]).T
   deltaT[ID] = pd.to_datetime(df[TIME_COL]).diff()
deltaT.boxplot()

Pretty simple just cant seem to get it do what I want in plotting a boxplot for each ID. I should not that data is given to me by a homegrown file reader that takes several complex files and sorts them into the data dictionary which is indexed by IDs.

I am running pandas version 0.14.0 and python version 2.7.7

Upvotes: 2

Views: 901

Answers (1)

jezrael
jezrael

Reputation: 862651

I am not sure how this works in 0.14.0 version, because last is 0.19.2 - I recommend upgrade if possible:

#sample data
np.random.seed(180)
dates = pd.date_range('2017-01-01 10:11:20', periods=10, freq='T')
cols = ['ID1','ID2']
df = pd.DataFrame(np.random.choice(dates, size=(10,2)), columns=cols)
print (df)
                  ID1                 ID2
0 2017-01-01 10:12:20 2017-01-01 10:17:20
1 2017-01-01 10:16:20 2017-01-01 10:20:20
2 2017-01-01 10:18:20 2017-01-01 10:17:20
3 2017-01-01 10:12:20 2017-01-01 10:16:20
4 2017-01-01 10:14:20 2017-01-01 10:18:20
5 2017-01-01 10:18:20 2017-01-01 10:19:20
6 2017-01-01 10:17:20 2017-01-01 10:12:20
7 2017-01-01 10:13:20 2017-01-01 10:17:20
8 2017-01-01 10:16:20 2017-01-01 10:11:20
9 2017-01-01 10:13:20 2017-01-01 10:19:20

Call DataFrame.diff and then convert timedeltas to total_seconds:

df = df.diff().apply(lambda x: x.dt.total_seconds())
print(df)
     ID1    ID2
0    NaN    NaN
1  240.0  180.0
2  120.0 -180.0
3 -360.0  -60.0
4  120.0  120.0
5  240.0   60.0
6  -60.0 -420.0
7 -240.0  300.0
8  180.0 -360.0
9 -180.0  480.0

Last use DataFrame.plot.box

df.plot.box()

graph

You can also check docs.

Upvotes: 1

Related Questions