3vlM33pl3
3vlM33pl3

Reputation: 547

Matplotlib graph with same data doesn't overlap

I've generated some data and try to visualize them as two graphs in the same plot. One as a bar, the other one as a line.

However for some reason the graphs don't seem to overlap.

Here is my code:

# roll two 6-sided dices 500 times
dice_1 = pd.Series(np.random.randint(1, 7, 500))
dice_2 = pd.Series(np.random.randint(1, 7, 500))

dices = dice_1 + dice_2

# plotting the requency of a 2 times 6 sided dice role
fc = collections.Counter(dices)
freq = pd.Series(fc)
freq.plot(kind='line', alpha=0.6, linestyle='-', marker='o')
freq.plot(kind='bar', color='k', alpha=0.6)

And here is the graph.

enter image description here

The data set is the same however the line graph is moved two data points to the right (starts at 4 instead of 2). If I plot them separately, they show up correctly (both starting at 2). So what's different if I plot them in the same graph? And how to fix this?

Upvotes: 3

Views: 1841

Answers (2)

sgDysregulation
sgDysregulation

Reputation: 4417

This happens because the series plot use index, setting the use_index to False will fix the issue, I also suggest using groupby and len to count frequency of each combination

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# roll two 6-sided dices 500 times
dice_1 = pd.Series(np.random.randint(1, 7, 500))
dice_2 = pd.Series(np.random.randint(1, 7, 500))
dices = dice_1 + dice_2

# returns the corresponding value of each index from dices
func = lambda x: dices.loc[x]

fc = dices.groupby(func).agg({'count': len})

ax = fc.plot(kind='line', alpha=0.6, linestyle='-',
             marker='o', use_index=False)
fc.plot(ax=ax, kind='bar', alpha=0.6, color='k')

plt.show()

The result is shown below plot

Upvotes: 1

roganjosh
roganjosh

Reputation: 13175

I haven't been able to find a simpler way to do this than to re-supply the x-axis data. If this is representative of a much larger approach you are using then perhaps you need to plot this data from a pd.Series() rather than using lists, but this code will at least give you the plot you desire. Change iteritems() to items() if you're using Python 3.

It seems that some auto-scaling of the x-axis takes place after the line plot, which is putting the two plots out of sync by two points (the lowest value possible). It might be possible to disable this autoscaling on the x-axis until both plots are made but this seems to be more difficult.

import collections
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# roll two 6-sided dices 500 times
dice_1 = pd.Series(np.random.randint(1, 7, 500))
dice_2 = pd.Series(np.random.randint(1, 7, 500))

dices = dice_1 + dice_2

# plotting the requency of a 2 times 6 sided dice role
fc = collections.Counter(dices)

x_axis = [key for key, value in fc.iteritems()]
y_axis = [value for key, value in fc.iteritems()]

plt.plot(x_axis, y_axis, alpha=0.6, linestyle='-', marker='o')
plt.bar(x_axis, y_axis, color='k', alpha=0.6, align='center')
plt.show()

Upvotes: 1

Related Questions