Coder
Coder

Reputation: 1121

Multiple lists to Pandas DataFrame

I have three list here

[1,2,3,4,5]

[5,4,6,7,2]

[1,2,4,5,6,7,8,9,0]

I want this kind of output:

A     B    C
1     5    1
2     4    2
3     6    4
4     7    5
5     2    6
           7
           8
           9
           0

I tried one syntax , but it gives me this error arrays must all be same length and another error was Length of values does not match length of index

Is there any way to get this kind of output?

Upvotes: 4

Views: 4941

Answers (3)

cs95
cs95

Reputation: 402814

This is not easily supported, but it can be done. DataFrame.from_dict will with the "index" orient. Assuming your lists are A, B, and C:

pd.DataFrame([A, B, C]).T

     0    1    2
0  1.0  5.0  1.0
1  2.0  4.0  2.0
2  3.0  6.0  4.0
3  4.0  7.0  5.0
4  5.0  2.0  6.0
5  NaN  NaN  7.0
6  NaN  NaN  8.0
7  NaN  NaN  9.0
8  NaN  NaN  0.0

Another option is using DataFrame.from_dict:

pd.DataFrame.from_dict({'A' : A, 'B' : B, 'C' : C}, orient='index').T

     A    B    C
0  1.0  5.0  1.0
1  2.0  4.0  2.0
2  3.0  6.0  4.0
3  4.0  7.0  5.0
4  5.0  2.0  6.0
5  NaN  NaN  7.0
6  NaN  NaN  8.0
7  NaN  NaN  9.0
8  NaN  NaN  0.0

A third solution with zip_longest and DataFrame.from_records:

from itertools import zip_longest
pd.DataFrame.from_records(zip_longest(A, B, C), columns=['A', 'B', 'C'])
# pd.DataFrame.from_records(list(zip_longest(A, B, C)), columns=['A', 'B', 'C'])

     A    B  C
0  1.0  5.0  1
1  2.0  4.0  2
2  3.0  6.0  4
3  4.0  7.0  5
4  5.0  2.0  6
5  NaN  NaN  7
6  NaN  NaN  8
7  NaN  NaN  9
8  NaN  NaN  0

Upvotes: 6

iGian
iGian

Reputation: 11193

An idea for a custom way.

Define a couple of methods to adjust the input data:

def longest(*lists):
  return max([ len(x) for x in lists])

def equalize(col, size):
  delta = size - len(col)
  if delta == 0: return col
  return col + [None for _ in range(delta)]

To be used building the dataframe:

import pandas as pd

size = longest(col1, col2, col3)
df = pd.DataFrame({'a':equalize(col1, size), 'b':equalize(col2, size), 'c':equalize(col3, size)})

Which returns

     a    b  c
0  1.0  5.0  1
1  2.0  4.0  2
2  3.0  6.0  4
3  4.0  7.0  5
4  5.0  2.0  6
5  NaN  NaN  7
6  NaN  NaN  8
7  NaN  NaN  9
8  NaN  NaN  0

Upvotes: 0

EdChum
EdChum

Reputation: 394159

alternative is to perform a list comprehension of a Series of each list and construct a df from this:

In[61]:
df = pd.DataFrame([pd.Series(x) for x in [A,B,C]], index=list('ABC')).T
df

Out[61]: 
     A    B    C
0  1.0  5.0  1.0
1  2.0  4.0  2.0
2  3.0  6.0  4.0
3  4.0  7.0  5.0
4  5.0  2.0  6.0
5  NaN  NaN  7.0
6  NaN  NaN  8.0
7  NaN  NaN  9.0
8  NaN  NaN  0.0

timings:

%timeit pd.DataFrame([pd.Series(x) for x in [A,B,C]], index=list('ABC')).T
%timeit pd.DataFrame.from_dict({'A' : A, 'B' : B, 'C' : C}, orient='index').T
from itertools import zip_longest
%timeit pd.DataFrame.from_records(list(zip_longest(A, B, C)), columns=['A', 'B', 'C'])

1.23 ms ± 12 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
977 µs ± 1.63 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
545 µs ± 8.08 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

So the last method is the fastest

Upvotes: 4

Related Questions