George Young
George Young

Reputation: 355

How to get numpy ndarray from file of datetimes and to plot the differences with matplotlib?

I'm a numpy newbie, using numpy 1.10.2 and Python 2.7.6 on Linux. I have a file of 17M datetimes, like "2015-12-24 03:39:02.012". I want to plot the differences, d[n]-d[n-1], as a function of time.

What is the numpy-ish way to get a darray from this file, and then some matplotlib way to plot diff vs. datetime (doesn't matter if diff n-1 or n+1)?

I don't need a blinding speed hack; I'd rather learn the idiomatic numpy techniques.

Data looks like:

2015-12-24 03:39:02.009
2015-12-24 03:39:02.012
2015-12-24 03:39:02.015
2015-12-24 03:39:02.018
2015-12-24 03:39:02.021
2015-12-24 03:39:02.024
2015-12-24 03:39:02.027
2015-12-24 03:39:02.030
2015-12-24 03:39:02.033
2015-12-24 03:39:02.036
2015-12-24 03:39:02.039
2015-12-24 03:39:02.042
2015-12-24 03:39:02.045
2015-12-24 03:39:02.048
2015-12-24 03:39:02.051
2015-12-24 03:39:02.054
2015-12-24 03:39:02.057
2015-12-24 03:39:02.060
2015-12-24 03:39:02.063
2015-12-24 03:39:02.066

... 17M lines

So, to be clear, I want to plot something like

datetime64(2015-12-24 03:39:02.009), 3 # second datetime-first datetime
datetime64(2015-12-24 03:39:02.012), 3 # third datetime-second datetime
datetime64(2015-12-24 03:39:02.015), 3 # fourth datetime-third datetime

...

What I'm really looking for is spikes in the interval and what time the spikes happened.

Upvotes: 2

Views: 68

Answers (1)

Mike Müller
Mike Müller

Reputation: 85462

Pandas can read the file in one line:

from matplotlib import pyplot as plt
import pandas as pd

df = pd.read_csv('data.txt', header=None, parse_dates=[0], names=['date'])

The result looks like this:

enter image description here

Calculate the difference

diff = df[1:] - df.shift()[1:]

Plot the result:

plt.plot(df[1:], diff.values)

enter image description here

You can convert the values into seconds:

seconds = diff.date.get_values().astype(float) / 1e9

Upvotes: 1

Related Questions