Reputation: 1121
I have three list here
[1,2,3,4,5]
[5,4,6,7,2]
[1,2,4,5,6,7,8,9,0]
I want this kind of output:
A B C
1 5 1
2 4 2
3 6 4
4 7 5
5 2 6
7
8
9
0
I tried one syntax , but it gives me this error arrays must all be same length
and another error was Length of values does not match length of index
Is there any way to get this kind of output?
Upvotes: 4
Views: 4941
Reputation: 402814
This is not easily supported, but it can be done. DataFrame.from_dict
will with the "index" orient. Assuming your lists are A
, B
, and C
:
pd.DataFrame([A, B, C]).T
0 1 2
0 1.0 5.0 1.0
1 2.0 4.0 2.0
2 3.0 6.0 4.0
3 4.0 7.0 5.0
4 5.0 2.0 6.0
5 NaN NaN 7.0
6 NaN NaN 8.0
7 NaN NaN 9.0
8 NaN NaN 0.0
Another option is using DataFrame.from_dict
:
pd.DataFrame.from_dict({'A' : A, 'B' : B, 'C' : C}, orient='index').T
A B C
0 1.0 5.0 1.0
1 2.0 4.0 2.0
2 3.0 6.0 4.0
3 4.0 7.0 5.0
4 5.0 2.0 6.0
5 NaN NaN 7.0
6 NaN NaN 8.0
7 NaN NaN 9.0
8 NaN NaN 0.0
A third solution with zip_longest
and DataFrame.from_records
:
from itertools import zip_longest
pd.DataFrame.from_records(zip_longest(A, B, C), columns=['A', 'B', 'C'])
# pd.DataFrame.from_records(list(zip_longest(A, B, C)), columns=['A', 'B', 'C'])
A B C
0 1.0 5.0 1
1 2.0 4.0 2
2 3.0 6.0 4
3 4.0 7.0 5
4 5.0 2.0 6
5 NaN NaN 7
6 NaN NaN 8
7 NaN NaN 9
8 NaN NaN 0
Upvotes: 6
Reputation: 11193
An idea for a custom way.
Define a couple of methods to adjust the input data:
def longest(*lists):
return max([ len(x) for x in lists])
def equalize(col, size):
delta = size - len(col)
if delta == 0: return col
return col + [None for _ in range(delta)]
To be used building the dataframe:
import pandas as pd
size = longest(col1, col2, col3)
df = pd.DataFrame({'a':equalize(col1, size), 'b':equalize(col2, size), 'c':equalize(col3, size)})
Which returns
a b c
0 1.0 5.0 1
1 2.0 4.0 2
2 3.0 6.0 4
3 4.0 7.0 5
4 5.0 2.0 6
5 NaN NaN 7
6 NaN NaN 8
7 NaN NaN 9
8 NaN NaN 0
Upvotes: 0
Reputation: 394159
alternative is to perform a list comprehension of a Series
of each list and construct a df from this:
In[61]:
df = pd.DataFrame([pd.Series(x) for x in [A,B,C]], index=list('ABC')).T
df
Out[61]:
A B C
0 1.0 5.0 1.0
1 2.0 4.0 2.0
2 3.0 6.0 4.0
3 4.0 7.0 5.0
4 5.0 2.0 6.0
5 NaN NaN 7.0
6 NaN NaN 8.0
7 NaN NaN 9.0
8 NaN NaN 0.0
%timeit pd.DataFrame([pd.Series(x) for x in [A,B,C]], index=list('ABC')).T
%timeit pd.DataFrame.from_dict({'A' : A, 'B' : B, 'C' : C}, orient='index').T
from itertools import zip_longest
%timeit pd.DataFrame.from_records(list(zip_longest(A, B, C)), columns=['A', 'B', 'C'])
1.23 ms ± 12 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
977 µs ± 1.63 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
545 µs ± 8.08 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
So the last method is the fastest
Upvotes: 4