vare
vare

Reputation: 91

How to dynamically update a plot in a loop?

I have the following snippet that I would like to extend in a way that the data from each loop gets plotted on the same canvas instead of for each loop to a different one.

for level in range(len(result)):
  sizes = result[level].values()
  distribution=pd.DataFrame(Counter(sizes).items(), columns=['community size','number of communities'])
  distribution.plot(kind='scatter', x='community size', y='number of communities')

In the optimal case I also additionally would like to have the dots in the scatterplot color-coded according to the original data (Dots belonging to the data from one loop colored in the same color).

I am more or less new to both matplotlib and pandas, so andy help is highly appreciated.

Upvotes: 2

Views: 801

Answers (1)

unutbu
unutbu

Reputation: 879113

Instead of calling plot many times, you could build the entire data set as one DataFrame and then you would only need to call plot once.

Starting with

result = [{0: 21, 1: 7, 2: 67, 3: 12, 4: 15, 5: 7, 6: 54, 7: 49, 8: 50, 9: 31,
           10: 6, 11: 2, 12: 8, 13: 2, 14: 2, 15: 1, 16: 35, 17: 2, 18: 1, 19:
           4, 20: 2, 21: 4, 22: 3, 23: 1, 24: 1, 25: 1, 26: 1, 27: 1, 28: 1, 
           29: 1}, 
          {0: 2, 1: 5, 2: 2, 3: 3, 4: 1, 5: 2, 6: 3, 7: 2, 8: 1, 9: 1, 10: 1,
           11: 1, 12: 1, 13: 1, 14: 1, 15: 1, 16: 1, 17: 1}]

you could build a DataFrame with columns level and size:

df = pd.DataFrame([(level,val) for level, dct in enumerate(result) 
                   for val in dct.values()],
                  columns=['level', 'size'])

which looks like this:

    level  size
0       0    21
1       0     7
2       0    67
...
45      1     1
46      1     1
47      1     1

Now we can group by the level, and count how many items of each size there are in each group:

size_count = df.groupby(['level'])['size'].apply(lambda x: x.value_counts())
# level    
# 0      1      9
#        2      5
#        7      2
# ...
# 1      1     11
#        2      4
#        3      2
#        5      1
# dtype: int64

The groupby/apply above returns a pd.Series. To make this a DataFrame, we can make the index level values into columns by calling reset_index(), and then assign column names to the columns:

size_count = size_count.reset_index()
size_count.columns = ['level', 'community size', 'number of communities']

Now the desired plot can be generated with

size_count.plot(kind='scatter', x='community size', y='number of communities', 
                s=100, c='level')

s=100 controls the size of the dots, c='level' tells plot to color the dots according the value in the level column.


import pandas as pd
import matplotlib.pyplot as plt

result = [{0: 21, 1: 7, 2: 67, 3: 12, 4: 15, 5: 7, 6: 54, 7: 49, 8: 50, 9: 31,
           10: 6, 11: 2, 12: 8, 13: 2, 14: 2, 15: 1, 16: 35, 17: 2, 18: 1, 19:
           4, 20: 2, 21: 4, 22: 3, 23: 1, 24: 1, 25: 1, 26: 1, 27: 1, 28: 1, 
           29: 1}, 
          {0: 2, 1: 5, 2: 2, 3: 3, 4: 1, 5: 2, 6: 3, 7: 2, 8: 1, 9: 1, 10: 1,
           11: 1, 12: 1, 13: 1, 14: 1, 15: 1, 16: 1, 17: 1}]

df = pd.DataFrame([(level,val) for level, dct in enumerate(result) 
                   for val in dct.values()],
                  columns=['level', 'size'])
size_count = df.groupby(['level'])['size'].apply(lambda x: x.value_counts())
size_count = size_count.reset_index()
size_count.columns = ['level', 'community size', 'number of communities']
cmap = plt.get_cmap('jet')
size_count.plot(kind='scatter', x='community size', y='number of communities', 
                s=100, c='level', cmap=cmap)
plt.show()

enter image description here

Using a colorbar might be appropriate if there are dozens of levels.


On the other hand, if there are only a few levels, using a legend would make more sense. In that case, it is more convenient to call plot once for each level value, since the matplotlib code which makes the legend is set up to make one legend entry per plot:

import pandas as pd
import matplotlib.pyplot as plt

result = [{0: 21, 1: 7, 2: 67, 3: 12, 4: 15, 5: 7, 6: 54, 7: 49, 8: 50, 9: 31,
           10: 6, 11: 2, 12: 8, 13: 2, 14: 2, 15: 1, 16: 35, 17: 2, 18: 1, 19:
           4, 20: 2, 21: 4, 22: 3, 23: 1, 24: 1, 25: 1, 26: 1, 27: 1, 28: 1, 
           29: 1}, 
          {0: 2, 1: 5, 2: 2, 3: 3, 4: 1, 5: 2, 6: 3, 7: 2, 8: 1, 9: 1, 10: 1,
           11: 1, 12: 1, 13: 1, 14: 1, 15: 1, 16: 1, 17: 1}]

df = pd.DataFrame([(level,val) for level, dct in enumerate(result) 
                   for val in dct.values()],
                  columns=['level', 'size'])
groups = df.groupby(['level'])
fig, ax = plt.subplots()
for level, grp in groups:
    size_count = grp['size'].value_counts()
    ax.plot(size_count.index, size_count, markersize=12, marker='o', 
            linestyle='', label='level {}'.format(level))
ax.legend(loc='best', numpoints=1)
ax.set_xlabel('community size')
ax.set_ylabel('number of communities')
ax.grid(True)
plt.show()

enter image description here

Upvotes: 1

Related Questions