add-semi-colons
add-semi-colons

Reputation: 18830

Convert from matplotlib to ggplot2 within python

I have build framework to do some algorithm evaluation. I have build methods to calculate based on data that I am passing into these method. RMSE@K, NDCG@K, MAE@K etc.

ndcg = []
rmse = []
mae = []
for i in xrange(11):
    results = generate_metrics(data_file, i)
    ndcg.append(np.mean(results['ndcg']))
    rmse.append(np.mean(results['rmse']))
    mae.append(np.mean(results['mae']))
plt.plot(ndcg)
plt.plot(rmse)
plt.plot(mae)
plt.plot()
plt.show()

I want to use ggplot within python to plot this in one graph: X axis is @k values which is 0-10 and y axis relevant value in each list.

how can I convert above lists to a data frame like this:

   at_k      ndcg      rmse       mae
1     1 0.4880583 0.3438043 0.3400933
2     2 0.4880583 0.3438043 0.3400933
3     3 0.4880583 0.3438043 0.3400933
4     4 0.4880583 0.3438043 0.3400933
5     5 0.4880583 0.3438043 0.3400933
6     6 0.4880583 0.3438043 0.3400933
7     7 0.4880583 0.3438043 0.3400933
8     8 0.4880583 0.3438043 0.3400933
9     9 0.4880583 0.3438043 0.3400933
10   10 0.4880583 0.3438043 0.3400933

and plot it using ggplot

Upvotes: -1

Views: 734

Answers (1)

Carsten
Carsten

Reputation: 18446

Please note that this answer uses yhat'g ggpy for a python ggplot port. There exist other Python grammar of graphics implementations, such as plotnine, for which this answer does not work.

After generating some random data in the same form as your dataset using

import numpy as np
ndcg, rmse, mae = [], [], []
for i in xrange(11):
    rand = np.random.sample(3)
    ndcg.append(rand[0])
    rmse.append(rand[1])
    mae.append(rand[2])

I can create a Pandas DataFrame from it:

    import pandas as pd
at_k = range(1, 12)
df = pd.DataFrame({"at_k": at_k, "ndcg": ndcg, "rmse": rmse, "mae": mae})
print df

This outputs

    at_k       mae      ndcg      rmse
0      1  0.153102  0.546553  0.794357
1      2  0.882718  0.342260  0.762997
2      3  0.153298  0.695626  0.581455
3      4  0.073772  0.491996  0.384631
4      5  0.014066  0.369490  0.606842
5      6  0.892553  0.818312  0.396829
6      7  0.143114  0.739370  0.812050
7      8  0.847054  0.323221  0.932366
8      9  0.122838  0.613340  0.393237
9     10  0.645705  0.486312  0.138259
10    11  0.339063  0.223995  0.115242

Yay! But we can't use this for plotting with yhat's ggplot yet. Following this example, we need to transform the data:

df2 = pd.melt(df[['at_k', 'mae', 'ndcg', 'rmse']], id_vars=['at_k'])
print df2

Now we've got something like this (truncated):

    at_k variable     value
0      1      mae  0.153102
1      2      mae  0.882718
2      3      mae  0.153298
3      4      mae  0.073772
...
30     9     rmse  0.393237
31    10     rmse  0.138259
32    11     rmse  0.115242

Now it's the time to plot:

ggplot(aes(x='at_k', y='value', colour='variable'), data=df2) +\
    geom_point()

enter image description here

Upvotes: 2

Related Questions