Reputation: 1
I'm pretty new to Python and I have difficulties converting a dictionary to a DataFrame in Pyhton. My dictionary contains the probabilities for an up-movement of different stocks on different days. When I try to convert it into a DataFrame, the stock names are taken as column-names, which is just the way I want to have it. My problem is, that all the values appear in the first row of each column.
This is basically the code i tried to use:
At the beginning i have a sample of stocks like this:
stocks = ['MSFT', 'AAPL', 'AMZN']
To get the probabilities of an up-movement I used the following code:
proba = {stock: clf[stock].predict_proba(X_test[stock]) for stock in stocks}
print(proba)
gives me the following Output:
{'MSFT': array([[0.30994211],
[0.15608782],
[0.15608782],
[0.16334815],
[0.14721092],
[0.29563944],
[0.16334815],
[0.24821587],
[0.43182074],
[0.30994211],
[0.28825953],
.
.
.
[0.34160564]]), 'AAPL': array([[0.48241034],
[0.47819121],
[0.48937013],
[0.49798732],
[0.50132104],
.
.
.
[0.03298367]]), 'AMZN': array([[0.51179782],
[0.64532595],
[0.56331474],
[0.66499856],
[0.55492011],
[0.4623048 ],
[0.4536123 ],
[0.4613901 ],
[0.39305493],
[0.44297254],
.
.
.])}
My goal is now to convert this dictionary into a DataFrame that looks like this:
MSFT AAPL AMZN
0 0.875 0.983 0.276
1 0.345 0.765 0.342
2 0.654 0.444 0.874
... ... ...
... ... ...
In the end, the DataFrame should have 280 rows and 3 columns.
Here's a small sample to work with:
proba = {stock: clf[stock].predict_proba(X_test[stock]) for stock in stocks}
proba = {stock: np.delete(proba[stock], 0, axis=1) for stock in stocks}
print(proba)
The result is:
{'MSFT': array([[0.49784439],
[0.51812552],
[0.35948374]]), 'AAPL': array([[0.29038393],
[0.58038393],
[0.52032512]]), 'AMZN': array([[0.64295894],
[0.54295894],
[0.39719920]])}
These arrays should be converted to one DataFrame that looks like this:
MSFT AAPL AMZN
0 0.49784439 0.29038393 0.64295894
1 0.51812552 0.58038393 0.54295894
2 0.35948374 0.52032512 0.39719920
Hopefully the edit made it a bit clearer.
Upvotes: 0
Views: 2634
Reputation: 3063
# Convert list of lists into list
for key in res.keys():
res[key] = [x for sublist in res[key] for x in sublist]
# Read dictionary into DataFrame
df = pd.DataFrame.from_dict(res)
You must convert your list of lists into a single list before working on converting it into a DataFrame.
Upvotes: 1
Reputation: 355
You shouldn't put the dict inside a list, just use pandas.DataFrame(proba)
.
I would reccomed using the DataFrame.from_dict
function which would give the same result with default params:
In [1]: import pandas
In [2]: d = {'a' : [1,2,3], 'b':[4,5,6], 'c':[7,8,9]}
In [3]: pandas.DataFrame.from_dict(d)
Out[3]:
a b c
0 1 4 7
1 2 5 8
2 3 6 9
But would also let you change the orientation of the dict, which I found quiet useful.
You can pass orient
to the function if your dict keys are your index column. You will probably want to use columns as well to name your columns:
In [4]: pandas.DataFrame.from_dict(d, orient='index', columns=['first','second','hird'])
Out[4]:
first second hird
a 1 2 3
b 4 5 6
c 7 8 9
Upvotes: 1