najuste
najuste

Reputation: 234

Ploting matrix in matplotlib, while taking indexed data for positive and negative values, returns wrong plot axis or data

Having a list of variables representing some geographical data plus one variable expressed as possitive or negative, I try while plotting two variables in a scatterplot, plot their negative values and positive values in separate colours. The problem comes in already while plotting f.e. just positive values: despite their min-max values, the ploted values does not correspond with scale, seems like completely other values were ploted, or values are rescaled with other axis. As x and y axes, both get the same scaling. System: windows8 64bits, though python 2.7 on 32 bits (comes with ARcGIS), numpy 1.8.1., matplotlib (Can't check the version)..

code is smth like that:

>>> df.head()   ## DATA SAMPLE ##
       dem_sl  events  gwflabst  kipp_macht  luftb_hydgr_add
5056  4.01518       0  0.174846     3.56536         2.666560
5057  3.84420       0  0.000000     6.70155         2.193530
5058  3.95850       0  0.000000     7.18019         2.350860
5059  4.42980       0  0.661806     1.23403         3.514760
5496  1.25325       0  0.070530     9.10564        -0.821533

# df = ''pandas data frame, cleaned from NaN, lat lon dropped''
pos = np.where(df['events'] == 1)
neg = np.where(df['events'] == 0)
out = np.asmatrix(df)
# while looping through var:
for t in var:
  for tt in var:
    if t! = tt:

I added later here extra printing the max value of both var.

[Dbg]>>> print '...Plotting '+t, ' with min-max: ', df[t].min(), '---', df[t].max()
...Plotting kipp_macht  with min-max:  0.0 --- 52.7769
[Dbg]>>> print '...Plotting '+tt, ' with min-max: ', df[tt].min(), '---', df[tt].max()
...Plotting luftb_hydgr_add  with min-max:  -2.70172 --- 34.7528

And when I plot the first scatterplot:

                                #col index of the var
  plt.scatter(out[np.array(neg), df.columns.get_loc(t)],out[np.array(neg), df.columns.get_loc(tt)], marker='x', c='r')
  plt.scatter(out[np.array(pos), df.columns.get_loc(t)],out[np.array(pos), df.columns.get_loc(tt)], marker='+', c='b')
  plt.show()

  del var[0] # del the first var

Seems like I get the data on both axis scaled by overall (from both axis) max and minimum. It is the same when I try on other data with bigger diff. in scales. The plot with problems The most interesting part, that once before, just trying to plot once, without loops, I tried to scatterplot accesing data by using simple structure as out[0], so not 'searching' for indexes and I got what I expected. The result I should get And so now, I am not sure where is the problem, as even when I just plot negative values or just positive theay are already in a strange scale..

I tried creating fig and saving then fig to a file, tried cleaning the plot with plt.clf(). Simply by accessing the same values, tired to plot histograms to look at the dispersion, and all is fine except scatterplot.

Would appreciate any help!

Upvotes: 2

Views: 1032

Answers (1)

ali_m
ali_m

Reputation: 74262

If I understand correctly, you want to plot kipp_macht against luftb_hydgr_add separately for cases where events == 0 and events == 1.

You could just use the contents of the events column to make boolean indices, and use these to index into kipp_macht and luftb_hydgr_add:

plt.scatter(df.kipp_macht[df.events == 1], df.luftb_hydgr_add[df.events == 1],
            'b+', label='pos')
plt.scatter(df.kipp_macht[df.events == 0], df.luftb_hydgr_add[df.events == 0],
            'rx', label='neg')

Or you could get the corresponding row indices and use these to index into kipp_macht and luftb_hydgr_add, although this is a slightly more roundabout way to do things:

pos = np.where(df.events == 1)[0]
neg = np.where(df.events == 0)[0]
plt.scatter(df.kipp_macht[pos], df.luftb_hydgr_add[pos], 'b+', label='pos')
plt.scatter(df.kipp_macht[neg], df.luftb_hydgr_add[neg], 'rx', label='neg')

You could also filter your entire dataframe according to the events column:

df_pos = df[df.events == 1]
df_neg = df[df.events == 0]

Which would then allow you to plot the relationship between any pair of parameters for the 'positive' and 'negative' case, e.g.:

plt.scatter(df_pos.kipp_macht, df_pos.luftb_hydgr_add, 'b+', label='pos')
plt.scatter(df_neg.kipp_macht, df_neg.luftb_hydgr_add, 'rx', label='neg')

Upvotes: 2

Related Questions