Reputation: 333
good afternoon,
I can't find a way to draw/graph historical data (time series) and request you feedback with my problem.
I have a csv file containing a history of pids based on some criteria. The file looks like this:
|2019-12-13 14:00:00| 123456 |
|2019-12-13 14:00:00| 345678 |
|2019-12-13 14:00:20| 123456 |
|2019-12-13 14:00:20| 345678 |
|2019-12-13 14:00:40| 123456 |
|2019-12-13 14:00:40| 345678 |
|2019-12-13 14:00:40| 678123 |
|2019-12-13 14:01:00| 123456 |
|2019-12-13 14:01:00| 678123 |
so we have:
I'd like to draw a line-graph with X-axis begin my timestamp and Y-xis my pids to see the creation/death of my processes within my timeframe.
I'd start with storing my data in a pandas dataframe but then I don't know how to move forward.
Any recommandation to help me continue?
Thanks in advance
Upvotes: 1
Views: 1035
Reputation: 2407
Group by the 'pid', then in the groups set the time as the index, and rename the column to the value of pid. Then concatenate the resulted data frames:
r=[ grp.set_index("time") \
.assign(pid=idx) \
.rename(columns={"pid":pid}) \
for idx,(pid,grp) in enumerate(df.groupby("pid"),1) ]
e.g.: r[0]
123456
time
2019-12-13 14:00:00 1
2019-12-13 14:00:20 1
2019-12-13 14:00:40 1
2019-12-13 14:01:00 1
#rslt=pd.concat(r,axis=1).fillna(0).astype(int)
rslt=pd.concat(r,axis=1)
123456 345678 678123
time
2019-12-13 14:00:00 1 2 0
2019-12-13 14:00:20 1 2 0
2019-12-13 14:00:40 1 2 3
2019-12-13 14:01:00 1 0 3
# from matplotlib import pylab as plt
rslt.plot()
plt.show()
Upvotes: 2
Reputation: 150805
Here's what I would do:
# for shifting and naming lines
codes, names = df['pid'].factorize()
ax = (df.assign(pid_name=codes)
.pivot(index='timestamp', columns='pid_name', values='pid')
.plot()
)
# rename legend
h,l = ax.get_legend_handles_labels()
ax.legend(h, names)
Output:
Upvotes: 3