Shion
Shion

Reputation: 319

Plotting Pandas DF with Numpy Arrays

I have a Pandas df with multiple columns and each cell inside has a various number of elements of a Numpy array. I would like plot all the elements of the array for every cell within column.

I have tried

plt.plot(df['column'])
plt.plot(df['column'][0:])

both gives a ValueErr: setting an array element with a sequence

It is very important that these values get plotted to its corresponding index as the index represents linear time in this dataframe. I would really appreciate it if someone showed me how to do this properly. Perhaps there is a package other than matplotlib.pylot that is better suited for this?

Thank you

Upvotes: 0

Views: 3847

Answers (2)

JohanC
JohanC

Reputation: 80319

plt.plot needs a list of x-coordinates together with an equally long list of y-coordinates. As you seem to want to use the index of the dataframe for the x-coordinate and each cell contents for the y-coordinates, you need to repeat the x-values as many times as the length of the y-coordinates.

Note that this format doesn't suit a line plot, as connecting subsequent points would create some strange vertical lines. plt.plot accepts a marker as its third parameter, for example '.' to draw a simple dot at each position.

A code example:

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

N = 30
df = pd.DataFrame({f'column{c}':
                       [np.random.normal(np.random.uniform(10, 100), 1, np.random.randint(3, 11)) for _ in range(N)]
                   for c in range(1, 6)})
legend_handles = []
colors = plt.cm.Set1.colors
desired_columns = df.columns
for column, color in zip(desired_columns, colors):
    for ind, cell in df[column].iteritems():
        if len(cell) > 0:
            plotted, = plt.plot([ind] * len(cell), cell, '.', color=color)
    legend_handles.append(plotted)
plt.legend(legend_handles, desired_columns)
plt.show()

example plot

Note that pandas really isn't meant to store complete arrays inside cells. The preferred way is to create a dataframe in "long" form, with each value in a separate row (with the "index" repeated). Most functions of pandas and seaborn don't understand about arrays inside cells.

Here's a way to create a long form which can be called using Seaborn:

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns

N = 30
df = pd.DataFrame({f'column{c}':
                       [np.random.normal(np.random.uniform(10, 100), 1, np.random.randint(3, 11)) for _ in range(N)]
                   for c in range(1, 6)})

desired_columns = df.columns
df_long_data = []
for column in desired_columns:
    for ind, cell in df[column].iteritems():
        for val in cell:
            dict = {'timestamp': ind, 'column_name': column, 'value': val}
            df_long_data.append(dict)
df_long = pd.DataFrame(df_long_data)
sns.scatterplot(x='timestamp', y='value', hue='column_name', data=df_long)
plt.show()

seaborn example

Upvotes: 3

Sahaj Adlakha
Sahaj Adlakha

Reputation: 156

As per your problem, you have numpy arrays in each cell which you wanna plot. To pass your data to plt.plot() method you might need to pass every cell individually as whenever you try to pass it as a whole like you did, it is actually a sequence that you are passing. But the plot() method will accept a numpy array. This might help:

for column in df.columns:
    for cell in df[column]:
        plt.plot(cell)
        plt.show()

Upvotes: 0

Related Questions