Reputation: 355
I'm a numpy newbie, using numpy 1.10.2 and Python 2.7.6 on Linux. I have a file of 17M datetimes, like "2015-12-24 03:39:02.012"
.
I want to plot the differences, d[n]-d[n-1]
, as a function of time.
What is the numpy-ish way to get a darray from this file, and then some matplotlib way to plot diff vs. datetime (doesn't matter if diff n-1
or n+1
)?
I don't need a blinding speed hack; I'd rather learn the idiomatic numpy techniques.
Data looks like:
2015-12-24 03:39:02.009
2015-12-24 03:39:02.012
2015-12-24 03:39:02.015
2015-12-24 03:39:02.018
2015-12-24 03:39:02.021
2015-12-24 03:39:02.024
2015-12-24 03:39:02.027
2015-12-24 03:39:02.030
2015-12-24 03:39:02.033
2015-12-24 03:39:02.036
2015-12-24 03:39:02.039
2015-12-24 03:39:02.042
2015-12-24 03:39:02.045
2015-12-24 03:39:02.048
2015-12-24 03:39:02.051
2015-12-24 03:39:02.054
2015-12-24 03:39:02.057
2015-12-24 03:39:02.060
2015-12-24 03:39:02.063
2015-12-24 03:39:02.066
... 17M lines
So, to be clear, I want to plot something like
datetime64(2015-12-24 03:39:02.009), 3 # second datetime-first datetime
datetime64(2015-12-24 03:39:02.012), 3 # third datetime-second datetime
datetime64(2015-12-24 03:39:02.015), 3 # fourth datetime-third datetime
...
What I'm really looking for is spikes in the interval and what time the spikes happened.
Upvotes: 2
Views: 68
Reputation: 85462
Pandas can read the file in one line:
from matplotlib import pyplot as plt
import pandas as pd
df = pd.read_csv('data.txt', header=None, parse_dates=[0], names=['date'])
The result looks like this:
Calculate the difference
diff = df[1:] - df.shift()[1:]
Plot the result:
plt.plot(df[1:], diff.values)
You can convert the values into seconds:
seconds = diff.date.get_values().astype(float) / 1e9
Upvotes: 1