Reputation: 831
I've the following dictionary:
dictA = {'A': [[1, 2, 3], [1, 2, 3], [1, 2, 3]],
'B': [[4, 4, 4], [4, 4, 4],],
'C': [[4, 6, 0]]
}
I want to convert it to a pd.DataFrame()
, expecting this:
id ColA ColB ColC
0 1 4 4
1 2 4 6
2 3 4 0
3 1 4
4 2 4
5 3 4
6 1
7 2
8 3
How can I do that? I'm trying
pd.DataFrame(dictAll.items(), columns=['ColA', 'ColB', 'ColC'])
But it obviously doesn't work!
Upvotes: 1
Views: 1103
Reputation: 27547
Here is how:
import pandas as pd
import numpy as np
dictA = {'A': [[1, 2, 3], [1, 2, 3], [1, 2, 3]],
'B': [[4, 4, 4], [4, 4, 4],],
'C': [[4, 6, 0]]}
df = pd.DataFrame(dict([(f'Col{k}', pd.Series([a for b in v for a in b])) for k,v in dictA.items()])).replace(np.nan, '')
print(df)
Output:
ColA ColB ColC
0 1 4 4
1 2 4 6
2 3 4 0
3 1 4
4 2 4
5 3 4
6 1
7 2
8 3
Now, let's have a look at the problem one step at a time.
The first thing we might try is simply:
df = pd.DataFrame(dictA)
print(df)
Which, of course, return this error:
ValueError: arrays must all be same length
So now we need a way to be able to create dataframes from a dict
with arrays of different lengths. For that, we can:
df = pd.DataFrame(dict([(k, pd.Series(v)) for k, v in dictA.items()]))
print(df)
Output:
A B C
0 [1, 2, 3] [4, 4, 4] [4, 6, 0]
1 [1, 2, 3] [4, 4, 4] NaN
2 [1, 2, 3] NaN NaN
We want the dataframe to be vertical, so for each iteration, flatten out the lists with a list comprehension:
df = pd.DataFrame(dict([(k, pd.Series([a for b in v for a in b])) for k, v in dictA.items()]))
print(df)
Output:
A B C
0 1 4.0 4.0
1 2 4.0 6.0
2 3 4.0 0.0
3 1 4.0 NaN
4 2 4.0 NaN
5 3 4.0 NaN
6 1 NaN NaN
7 2 NaN NaN
8 3 NaN NaN
Now we want to replace all the NaN
s with blanks. For that, we need to import numpy as np
, and do:
df = pd.DataFrame(dict([(k, pd.Series([a for b in v for a in b])) for k, v in dictA.items()])).replace(np.nan, '')
print(df)
Output:
A B C
0 1 4 4
1 2 4 6
2 3 4 0
3 1 4
4 2 4
5 3 4
6 1
7 2
8 3
Finally use formatted string to convert the letters into "Col"
letters:
df = pd.DataFrame(dict([(f'Col{k}', pd.Series([a for b in v for a in b])) for k,v in dictA.items()])).replace(np.nan, '')
print(df)
Output:
ColA ColB ColC
0 1 4 4
1 2 4 6
2 3 4 0
3 1 4
4 2 4
5 3 4
6 1
7 2
8 3
Upvotes: 3