Kevin S.
Kevin S.

Reputation: 1210

matplotlib: How to use marker size / color as an extra dimension in plots?

I am plotting a time series where x is a series of datetime.datetime objects and y is a series of doubles.

I'd like to map the marker size to a third series z (and possibly also map marker color to a fourth series w), which in most cases could be accomplished with:

scatter(x, y, s=z, c=w)

except scatter() does not permit x being a series of datetime.datetime objects.

plot(x, y, marker='o', linestyle='None')

on the other hand works with x being datetime.datetime (with properly tick label), but markersize/color can only be set for all points at once, namely no way to map them to extra series.

Seeing that scatter and plot each can do half of what I need, is there a way to do both?

UPDATE following @tcaswell's question, I realized scatter raised an KeyError deep in the default_units() in matplotlib/dates.py on the line:

x = x[0]

and sure enough my x and y are both Series taken from a pandas DataFrame which has no '0' in index. I then tried two things (both feel somewhat hacky):

First, I tried modify the DataFrame index to 0..len(x), which led to a different error inside matplotlib/axes/_axes.py at:

offsets  = np.dstack((x,y))

dstack doesn't play nice with pandas Series. So I then tried convert x and y to numpy.array:

scatter(numpy.array(x), numpy.array(y), s=numpy.array(z))

This almost worked except scatter seemed to have trouble auto-scaling x axis and collapsed everything into a straight line, so I have to reset xlim explicitly to see the plot.

All of this is to say that scatter could do the job albeit with a bit of convolution. I had always thought matplotlib can take any array-like inputs but apparently that's not quite true if the data is not simple numbers that require some internal gymnastics.

UPDATE2 I also tried to follow @user3666197's suggestion (thanks for the editing tips btw). If I understood correctly, I first converted x into a series of 'matplotlib style days':

mx = mPlotDATEs.date2num(list(x))

which then allows me to directly call:

scatter(mx, y, s=z)

then to label axis properly, I call:

gca().xaxis.set_major_formatter( DateFormatter('%Y-%m-%d %H:%M'))

(call show() to update the axis label if interactive mode)

It worked quite nicely and feels to me a more 'proper' way of doing things, so I'm going to accept that as the best answer.

Upvotes: 3

Views: 4718

Answers (3)

user3666197
user3666197

Reputation: 1

Is there a way to do both? Yes.

However, let's work by example:

enter image description here enter image description here

step A: from a datetime to a matplotlib convention-compatible float for dates/times
step B: adding 3D | 4D | 5D capabilities ( using additional { color | size | alpha } --coded dimensionality of information )


As usual, devil is hidden in detail.

matplotlib dates are almost equal, but not equal:

#  mPlotDATEs.date2num.__doc__
#                  
#     *d* is either a class `datetime` instance or a sequence of datetimes.
#
#     Return value is a floating point number (or sequence of floats)
#     which gives the number of days (fraction part represents hours,
#     minutes, seconds) since 0001-01-01 00:00:00 UTC, *plus* *one*.
#     The addition of one here is a historical artifact.  Also, note
#     that the Gregorian calendar is assumed; this is not universal
#     practice.  For details, see the module docstring.

So, highly recommended to re-use their "own" tool:

from matplotlib import dates as mPlotDATEs   # helper functions num2date()
#                                            #              and date2num()
#                                            #              to convert to/from.

Managing axis-labels & formatting & scale (min/max) is a separate issue

Nevertheless, matplotlib brings you arms for this part too:

from matplotlib.dates   import  DateFormatter,    \
                                AutoDateLocator,   \
                                HourLocator,        \
                                MinuteLocator,       \
                                epoch2num
from matplotlib.ticker  import  ScalarFormatter, FuncFormatter

and may for example do:

    aPlotAX.set_xlim( x_min, x_MAX )               # X-AXIS LIMITs ------------------------------------------------------------------------------- X-LIMITs
    
    #lt.gca().xaxis.set_major_locator(      matplotlib.ticker.FixedLocator(  secs ) )
    #lt.gca().xaxis.set_major_formatter(    matplotlib.ticker.FuncFormatter( lambda pos, _: time.strftime( "%d-%m-%Y %H:%M:%S", time.localtime( pos ) ) ) )
    
    aPlotAX.xaxis.set_major_locator(   AutoDateLocator() )
    
    aPlotAX.xaxis.set_major_formatter( DateFormatter( '%Y-%m-%d %H:%M' ) )  # ----------------------------------------------------------------------------------------- X-FORMAT

    #--------------------------------------------- # 90-deg x-tick-LABELs

    plt.setp( plt.gca().get_xticklabels(),  rotation            = 90,
                                            horizontalalignment = 'right'
                                            )
    
    #------------------------------------------------------------------

Adding { 3D | 4D | 5D } transcoding

Just to imagine the approach, check this example, additional dimensionality of information was coded using different tools into { color | size | alpha }. Whereas { size | alpha } are scatter-point related, for color there are additional tools in matplotlib included a set of colouring scaled for various domain-specific or human-eye vision / perception adapted colour-scales. A nice explanation of color-scale / normalisation scaler is presented here.

enter image description here

You may have noticed, that this 4D example still has a constant alpha ( unused for 5th DOF in true 5D dimensionality visualisation ).

Upvotes: 5

Alejandro
Alejandro

Reputation: 3402

You could probably try a for loop. This is a relatively good option as long as you have not too much data to plot. Below I write a small example:

import numpy as np
import matplotlib.pyplot as plt

x = np.random.rand(100)
y = np.random.rand(100)

mass = np.linspace(1,10,100)

for i in xrange(len(x)):
    plt.plot(x[i],y[i],'ko', markersize=mass[i])
plt.show()

enter image description here

In principle you can do the same with the colour.

Upvotes: 1

Ilya Peterov
Ilya Peterov

Reputation: 2065

You can probably convert objects in x from datetime.datertime to int (by representing it in seconds since 1970 epoch)

import time

x = [time.mktime(elem.timetuple()) for elem in x]

and then pass it to scatter

scatter(x, y, s=z, c=w)

Upvotes: 1

Related Questions