Clara
Clara

Reputation: 49

Iterate over rows in Pandas DataFrame and generate new columns

I have a DataFrame with one column called cardinality that has in each row a list of n numbers. I would like to iterate over each row, take the first element from the list, and put it into a new column so that I have n columns.

Input:

input_data = pd.DataFrame({'cardinality':[[0,1,0,0,0],[1,0,1,0,0],[0,0,1,3,0],[0,0,0,1,1]]})

Desired output:

c_0  c_1  c_2  c_3  c_4
0    1    0    0    0
1    0    1    0    0
0    0    1    3    0
0    0    0    1    1

Here is my code:

iterator = 0
for row in input_data.itertuples():
    for cardinality in input_data['cardinality']:
        col_name = 'c_' + str(iterator)
        input_data[col_name] = [item[0] for item in input_data['cardinality']]
        input_data['cardinality'] = input_data['cardinality'].apply(lambda x: x.pop(0))
        iterator += 1

It seems like the .pop method is not the right way to remove the first item from the list.

Upvotes: 2

Views: 93

Answers (2)

Ben.T
Ben.T

Reputation: 29635

you can use tolist on the series and create a new dataframe, then add_prefx to name it as expected.

output = pd.DataFrame(input_data['cardinality'].tolist()).add_prefix('c_')
print(output)
   c_0  c_1  c_2  c_3  c_4
0    0    1    0    0    0
1    1    0    1    0    0
2    0    0    1    3    0
3    0    0    0    1    1

Upvotes: 3

Sayandip Dutta
Sayandip Dutta

Reputation: 15872

You can use pd.DataFrame constructor and rename:

>>> pd.DataFrame(input_data['cardinality'].tolist()).rename(columns='c_{}'.format)
   c_0  c_1  c_2  c_3  c_4
0    0    1    0    0    0
1    1    0    1    0    0
2    0    0    1    3    0
3    0    0    0    1    1

Upvotes: 3

Related Questions