Islacine
Islacine

Reputation: 105

How to add new rows to a Pandas Data Frame with varying column numbers?

I want to add new rows in a Pandas data frame without considering the order and the number of columns in every new row.

As I add new rows, I want my data frame to look like below. Every row can have different number of columns.

---- | 1    | 2    | 3    | 4 
row1 | data | data | 
row2 | data | data | data 
row3 | data | 
row4 | data | data | data | data 

Upvotes: 0

Views: 183

Answers (2)

OguzhanKa
OguzhanKa

Reputation: 60

In pandas you can concatenate new rows with an existing data frame (even if the new row has different number of columns) as below.

import pandas as pd

df = pd.DataFrame([list(range(5))])
new_row = pd.DataFrame([list(range(4))])
pd.concat([df,new_row], ignore_index=True, axis=0)

In the above code snippet, pd.concatenate function merges two data frames. If you provide the argument ignore_index=True, pandas will merge two data frames without considering their lengths.

Upvotes: 0

Swier
Swier

Reputation: 4186

Building pandas DataFrames one row at a time is typically very slow. One solution is to first gather the data in a dictionary, and then turn it into a dataframe for further processing:

d = {
    'att1': ['a', 'b'],
    'att2': ['c', 'd', 'e'],
    'att3': ['f'],
    'att4': ['g', 'h', 'i', 'j'],
}
df = pd.DataFrame.from_dict(d, orient='index')

Which results in df containing:

        0    1    2    3
att1    a    b    None None
att2    c    d    e    None
att3    f    None None None
att4    g    h    i    j

Or more in line with typical pandas formats, store the data in one long series where 'att1' is used as index for values 'a' and 'b', etc.:

series = df.stack().reset_index(level=1, drop=True)

which allows for easy selection of various attributes:

series.loc[['att1', 'att3']]

returning:

att1    a
att1    b
att3    f

Upvotes: 1

Related Questions