Farzad Saif
Farzad Saif

Reputation: 107

Converting a list with no tuples into a data frame

Normally when you want to create a turn a set of data into a Data Frame, you make a list for each column, create a dictionary from those lists, then create a data frame from the dictionary.

The data frame I want to create has 75 columns, all with the same number of rows. Defining lists one-by-one isn't going work. Instead I decided to make a single list and iteratively put a certain chunk of each row onto a Data Frame. Here I will make an example where I turn a list into a data frame:

lst = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

# Example list

df = 
   a b c d e
0  0 2 4 6 8
1  1 3 5 7 9

# Result I want from the example list

Here is my test code:

import pandas as pd
import numpy as np

dict = {'a':[], 'b':[], 'c':[], 'd':[], 'e':[]}
df = pd.DataFrame(dict)

# Here is my test data frame, it contains 5 columns and no rows.

lst = np.arange(10).tolist()

# This is my test list, it looks like this lst = [0, 2, …, 9]

for i in range(len(lst)):
    df.iloc[:, i] = df.iloc[:, i]\
    .append(pd.Series(lst[2 * i:2 * i + 2]))

# This code is supposed to put two entries per column for the whole data frame.
# For the first column, i = 0, so [2 * (0):2 * (0) + 2] = [0:2]
# df.iloc[:, 0] = lst[0:2], so df.iloc[:, 0] = [0, 1]
# Second column i = 1, so [2 * (1):2 * (1) + 2] = [2:4]
# df.iloc[:, 1] = lst[2:4], so df.iloc[:, 1] = [2, 3]
# This is how the code was supposed to allocate lst to df.
# However it outputs an error.

When I run this code I get this error:

ValueError: cannot reindex from a duplicate axis

When I add ignore_index = True such that I have

for i in range(len(lst)):
    df.iloc[:, i] = df.iloc[:, i]\
    .append(pd.Series(lst[2 * i:2 * i + 2]), ignore_index = True)

I get this error:

IndexError: single positional indexer is out-of-bounds

After running the code, I check the results of df. The output is the same whether I ignore index or not.

In: df
Out:
   a   b   c   d   e
0  0 NaN NaN NaN NaN
1  1 NaN NaN NaN NaN

It seems that the first loop runs fine, but the error occurs when trying to fill the second column.

Does anybody know how to get this to work? Thank you.

Upvotes: 0

Views: 83

Answers (1)

Scott Boston
Scott Boston

Reputation: 153500

IIUC:

lst = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
alst = np.array(lst)
df = pd.DataFrame(alst.reshape(2,-1, order='F'), columns = [*'abcde'])
print(df)

Output:

   a  b  c  d  e
0  0  2  4  6  8
1  1  3  5  7  9

Upvotes: 1

Related Questions