Tom Roider
Tom Roider

Reputation: 1

Creating a DataFrame from a dictionary of arrays

I'm pretty new to Python and I have difficulties converting a dictionary to a DataFrame in Pyhton. My dictionary contains the probabilities for an up-movement of different stocks on different days. When I try to convert it into a DataFrame, the stock names are taken as column-names, which is just the way I want to have it. My problem is, that all the values appear in the first row of each column.

This is basically the code i tried to use:

At the beginning i have a sample of stocks like this:

stocks = ['MSFT', 'AAPL', 'AMZN']

To get the probabilities of an up-movement I used the following code:

proba = {stock: clf[stock].predict_proba(X_test[stock]) for stock in stocks}

print(proba)

gives me the following Output:

{'MSFT': array([[0.30994211],
   [0.15608782],
   [0.15608782],
   [0.16334815],
   [0.14721092],
   [0.29563944],
   [0.16334815],
   [0.24821587],
   [0.43182074],
   [0.30994211],
   [0.28825953],
   .
   .
   .
   [0.34160564]]), 'AAPL': array([[0.48241034],
   [0.47819121],
   [0.48937013],
   [0.49798732],
   [0.50132104],
   .
   .
   . 
   [0.03298367]]), 'AMZN': array([[0.51179782],
   [0.64532595],
   [0.56331474],
   [0.66499856],
   [0.55492011],
   [0.4623048 ],
   [0.4536123 ],
   [0.4613901 ],
   [0.39305493],
   [0.44297254],
   .
   .
   .])}

My goal is now to convert this dictionary into a DataFrame that looks like this:

    MSFT    AAPL    AMZN
0   0.875   0.983   0.276
1   0.345   0.765   0.342
2   0.654   0.444   0.874  
    ...     ...     ...
    ...     ...     ...

In the end, the DataFrame should have 280 rows and 3 columns.

Here's a small sample to work with:

proba = {stock: clf[stock].predict_proba(X_test[stock]) for stock in stocks}
proba = {stock: np.delete(proba[stock], 0, axis=1) for stock in stocks}
print(proba)

The result is:

{'MSFT': array([[0.49784439],
   [0.51812552],
   [0.35948374]]), 'AAPL': array([[0.29038393],
   [0.58038393],
   [0.52032512]]), 'AMZN': array([[0.64295894],
   [0.54295894],
   [0.39719920]])}

These arrays should be converted to one DataFrame that looks like this:

     MSFT         AAPL         AMZN
0    0.49784439   0.29038393   0.64295894
1    0.51812552   0.58038393   0.54295894
2    0.35948374   0.52032512   0.39719920

Hopefully the edit made it a bit clearer.

Upvotes: 0

Views: 2634

Answers (2)

Amit Singh
Amit Singh

Reputation: 3063

# Convert list of lists into list
for key in res.keys():
     res[key] = [x for sublist in res[key] for x in sublist]

# Read dictionary into DataFrame
df = pd.DataFrame.from_dict(res)

You must convert your list of lists into a single list before working on converting it into a DataFrame.

Upvotes: 1

theFrok
theFrok

Reputation: 355

You shouldn't put the dict inside a list, just use pandas.DataFrame(proba). I would reccomed using the DataFrame.from_dict function which would give the same result with default params:

In [1]: import pandas
In [2]: d = {'a' : [1,2,3], 'b':[4,5,6], 'c':[7,8,9]}
In [3]: pandas.DataFrame.from_dict(d)
Out[3]:
   a  b  c
0  1  4  7
1  2  5  8
2  3  6  9

But would also let you change the orientation of the dict, which I found quiet useful. You can pass orient to the function if your dict keys are your index column. You will probably want to use columns as well to name your columns:

In [4]: pandas.DataFrame.from_dict(d, orient='index', columns=['first','second','hird'])
Out[4]:
   first  second  hird
a      1       2     3
b      4       5     6
c      7       8     9

Upvotes: 1

Related Questions