add-semi-colons
add-semi-colons

Reputation: 18810

plot pandas data frame but most columns have zeros

I am new to pandas and ipython I just setup everything and currently playing around. I have following data frame:

  Field  10   20   30   40   50   60   70   80   90   95
0   A   0    0    0    0    0    0    0    0    1    3
1   B   0    0    0    0    0    0    0    1    4   14
2   C   0    0    0    0    0    0    0    1    2    7
3   D   0    0    0    0    0    0    0    1    5   15
4   u   0    0    0    0    0    0    0    1    5   14
5   K   0    0    0    0    0    0    1    2    7   21
6   S   0    0    0    0    0    0    0    1    3    8
7   E   0    0    0    0    0    0    0    1    3    8
8   F   0    0    0    0    0    0    0    1    6   16

I used a csv file to import this data:

df = pd.read_csv('/mycsvfile.csv', 
                         index_col=False, header=0)

As you can see post of the columns are zero this data frame has large number of rows but there is possibility that in column most of the rows can be zero while one or two remaining with a value like "70".

I wounder how can I get this to nice graph where I can show 70, 80, 95 columns with the emphasis.

I found following tutorial: [http://pandas.pydata.org/pandas-docs/version/0.9.1/visualization.html][1] but still I am unable to get a good figure.

Upvotes: 0

Views: 1018

Answers (1)

Rutger Kassies
Rutger Kassies

Reputation: 64443

It depends a bit on how you want to handle the zero values, but here is an approach:

df = pd.DataFrame({'a': [0,0,0,0,70,0,0,90,0,0,80,0,0],
                       'b': [0,0,0,50,0,60,0,90,0,80,0,0,0]})

fig, axs = plt.subplots(1,2,figsize=(10,4))

# plot the original, for comparison
df.plot(ax=axs[0])

for name, col in df.iteritems():
    col[col != 0].plot(ax=axs[1], label=name)

axs[1].set_xlim(df.index[0],df.index[-1])
axs[1].set_ylim(bottom=0)
axs[1].legend(loc=0)

enter image description here

You could also go for something with .replace(0,np.nan), but matplotlib doesnt draw lines if there are nan's in between. So you probably end up with looping over the columns anyway (and then using dropna().plot() for example).

Upvotes: 4

Related Questions