Mike Sweeney
Mike Sweeney

Reputation: 2046

Python line plot multiple time series on same plot

I'm parsing a file that has chronologically timestamped data for multiple time series that I would like to parse in python and then use matplotlib to create a single line plot with independent lines for each set of time series data. The data I'm parsing looks something like this:

time label   value
1.05 seriesA 3.925
1.09 seriesC 0.245
2.13 seriesB 12.32
2.73 seriesC 4.921

I've parsed the file into a dictionary of lists that contain (time,value) tuples keyed on the series label. I'm struggling with how to get from this to a single line plot with all this data. I want independent lines for seriesA, seriesB, seriesC, etc. on a single plot. Any pointers?

Edit: As requested the dictionary is below. I had a hard time figuring out the best way to store this data so maybe the data structure I'm using is also a problem. The keys below are the different time series labels and the values are a list of (time,value) tuples. In any case, here it is:

{'client1': [(861.991698574, 298189000.0), (862.000768158, 0.0)], 
'client2': [(861.781502324, 0.0), (861.78903722, 153600000.0), 
(862.281483262, 0.0), (862.289038158, 153600000.0)], 'client3': 
[(862.004470762, 3295674368.0), (862.004563939, 3295674368.0), 
(862.03981821, 799014912.0), (862.040403314, 1599078400.0), 
(862.540269616, 3295674368.0), (862.55133097, 1599078400.0)]}

Upvotes: 3

Views: 12334

Answers (2)

vestland
vestland

Reputation: 61104

Short answer:

Highlight and ctrl+c the data below:

label        time         value
client1  861.991699  2.981890e+08
client1  862.000768  0.000000e+00
client2  861.781502  0.000000e+00
client2  861.789037  1.536000e+08
client2  862.281483  0.000000e+00
client2  862.289038  1.536000e+08
client3  862.004471  3.295674e+09
client3  862.004564  3.295674e+09
client3  862.039818  7.990149e+08
client3  862.040403  1.599078e+09
client3  862.540270  3.295674e+09
client3  862.551331  1.599078e+09

Then run this snippet:

# imports
import pandas as pd

# read data from the clipboard
df = pd.read_clipboard(sep='\\s+')

# reshape the data to get values by time for each label
df = df.pivot(index='time', columns='label', values='value')

# Replace nans by forward filling existing values
df = df.fillna(method = 'ffill')

# You'll still have to handle the missing values in the beginning of the coloumns
df = df.fillna(method = 'bfill')

# A simple plot:
df.plot()

Then you'll get:

enter image description here


The Details

There are a few confusing elements in this question. If your source data is, as you say, of the form:

time label   value
1.05 seriesA 3.925
1.09 seriesC 0.245
2.13 seriesB 12.32
2.73 seriesC 4.921

But the true content of your data is:

{'client1': [(861.991698574, 298189000.0), (862.000768158, 0.0)], 
'client2': [(861.781502324, 0.0), (861.78903722, 153600000.0), 
(862.281483262, 0.0), (862.289038158, 153600000.0)], 'client3': 
[(862.004470762, 3295674368.0), (862.004563939, 3295674368.0), 
(862.03981821, 799014912.0), (862.040403314, 1599078400.0), 
(862.540269616, 3295674368.0), (862.55133097, 1599078400.0)]}

Then the true content AND form of your data should be:

label        time         value
client1  861.991699  2.981890e+08
client1  862.000768  0.000000e+00
client2  861.781502  0.000000e+00
client2  861.789037  1.536000e+08
client2  862.281483  0.000000e+00
client2  862.289038  1.536000e+08
client3  862.004471  3.295674e+09
client3  862.004564  3.295674e+09
client3  862.039818  7.990149e+08
client3  862.040403  1.599078e+09
client3  862.540270  3.295674e+09
client3  862.551331  1.599078e+09

In any case, there is absolutely no reason to utilize a dictionary to obtain your

[...]single line plot with all this data. I want independent lines for seriesA, seriesB, seriesC, etc. on a single plot.

I believe the most efficient approach would be Reshaping and Pivot Tables from the pandas docs. From there you can plot the data directly using df.plot().

Highlight and ctrl+c the data above, and you're good to go:

# imports
import pandas as pd

# read data from the clipboard
df = pd.read_clipboard(sep='\\s+')

# reshape the data to get values by time for each label
df = df.pivot(index='time', columns='label', values='value')
print(df)

This should represent the desired form of your data:

label           client1      client2       client3
time                                              
861.781502          NaN          0.0           NaN
861.789037          NaN  153600000.0           NaN
861.991699  298189000.0          NaN           NaN
862.000768          0.0          NaN           NaN
862.004471          NaN          NaN  3.295674e+09
862.004564          NaN          NaN  3.295674e+09
862.039818          NaN          NaN  7.990149e+08
862.040403          NaN          NaN  1.599078e+09
862.281483          NaN          0.0           NaN
862.289038          NaN  153600000.0           NaN
862.540270          NaN          NaN  3.295674e+09
862.551331          NaN          NaN  1.599078e+09

There are still a few issues to be handled given the somewhat peculiar time index. To make this data plot-friendly, we should handle the missing values. This is easily done in the next snippet using df.fillna from the pandas docs:

# Replace nans by forward filling existing values
df = df.fillna(method = 'ffill')

# You'll still have to handle the missing values
# in the beginning of the coloumns
df = df.fillna(method = 'bfill')

Now you'll get a line chart simply by using df.plot():

enter image description here

Edit:

Let me know what your data source is in order to give you a few tips on how to read and store your data. Again, pandas and is most likely the way to go.

Upvotes: 1

sacuL
sacuL

Reputation: 51335

I like pandas for this type of problem.

First, put the data in a pandas dataframe:

import pandas as pd

data = {'client1': [(861.991698574, 298189000.0), (862.000768158, 0.0)], 
'client2': [(861.781502324, 0.0), (861.78903722, 153600000.0), 
(862.281483262, 0.0), (862.289038158, 153600000.0)], 'client3': 
[(862.004470762, 3295674368.0), (862.004563939, 3295674368.0), 
(862.03981821, 799014912.0), (862.040403314, 1599078400.0), 
(862.540269616, 3295674368.0), (862.55133097, 1599078400.0)]}

time = []
label = []
value = []

for k, v in data.items():
    for tup in v:
        label.append(k)
        time.append(tup[0])
        value.append(tup[1])

df = pd.DataFrame({'time':time, 'label':label, 'value':value})

Resulting in this dataframe:

>>> df
      label        time         value
0   client1  861.991699  2.981890e+08
1   client1  862.000768  0.000000e+00
2   client2  861.781502  0.000000e+00
3   client2  861.789037  1.536000e+08
4   client2  862.281483  0.000000e+00
5   client2  862.289038  1.536000e+08
6   client3  862.004471  3.295674e+09
7   client3  862.004564  3.295674e+09
8   client3  862.039818  7.990149e+08
9   client3  862.040403  1.599078e+09
10  client3  862.540270  3.295674e+09
11  client3  862.551331  1.599078e+09

Then, you can do this:

by_label = df.groupby('label')

for name, group in by_label:
    plt.plot(group['time'], group['value'], label=name)

plt.legend()
plt.show

Regarding how you should store your data in a dictionary; There are different ways to go about this, but if I were you, and to be able to use your data easily with pandas, I would use a dictionary of the form:

data = {'label':['client1', 'client1', 'client2', ...], 
 'time':[time1, time2, time3, ...], 
 'value':[value1, value2, value3, ...]}

making sure all your lists are ordered in the proper way (index 0 of all 3 keys is row 0 of your dataframe, index 1 is row 1, etc...). Then to import into pandas, all you would need to do is df = pd.DataFrame(data)

Upvotes: 6

Related Questions