David542
David542

Reputation: 110143

Adding a new column or row as pd.Series

I am trying to add one Column and one Row by using a pd.Series object. Here is what I have so far:

import pandas as pd
df = pd.DataFrame([
    {"Title": "Titanic",    "ReleaseYear": 1997, "Director": "James Cameron"},
    {"Title": "Spider-Man", "ReleaseYear": 2002, "Director": "Sam Raimi"}
])

# Add a new row
new_movie_row = pd.Series(['Jurassic Park', 1993, 'Steven Spielberg'])
df.loc[2] = new_row

# Add a new column
new_keyword_column = pd.Series(['Boat', 'Spider', 'Dinosaur'])
df['Keyword'] = new_keyword_column
df

This seems to add the Column fine, however the Row gives me all NaN:

enter image description here

What would be the correct way to do this?

Upvotes: 0

Views: 1167

Answers (2)

jezrael
jezrael

Reputation: 862591

If want add new row or column is used alignment (it means pandas try matching Series index values and DataFrame columns/rows, if no match get NaNs for no matching values):

Your approach is good, only is necessary set same index values of Series for new row:

# Add a new row
new_movie_row = pd.Series(['Jurassic Park', 1993, 'Steven Spielberg'], index=df.columns)
df.loc[2] = new_movie_row

If default index values of DataFrame then default index is same, but for general data is necessary too.

# Add a new column
new_keyword_column = pd.Series(['Boat', 'Spider', 'Dinosaur'], index=df.index)
df['Keyword'] = new_keyword_column

print (df)
           Title  ReleaseYear          Director   Keyword
0        Titanic         1997     James Cameron      Boat
1     Spider-Man         2002         Sam Raimi    Spider
2  Jurassic Park         1993  Steven Spielberg  Dinosaur

But generally if need new row/column is possible use list or 1d array with same length (or scalar if need same values):

# Add a new row
df.loc[2] = ['Jurassic Park', 1993, 'Steven Spielberg']

# Add a new column
df['Keyword'] = ['Boat', 'Spider', 'Dinosaur']

# Add a new column with same values
df['same vals'] = 10


Why is necessary use Series and not only lists?

Only if some input data missing, then is necessary align by Series:

# Add a new row
new_movie_row = pd.Series(['Jurassic Park', 1993], index=['Title','ReleaseYear'])
df.loc[2] = new_movie_row
print (df)
           Title  ReleaseYear       Director
0        Titanic         1997  James Cameron
1     Spider-Man         2002      Sam Raimi
2  Jurassic Park         1993            NaN

Or specify columns too:

df.loc[2, ['Title','ReleaseYear']] = ['Jurassic Park', 1993]

If use only list get error:

df.loc[3] = ['Jurassic Park', 1993]
print (df)

>ValueError: cannot set a row with mismatched columns

Upvotes: 3

Ch3steR
Ch3steR

Reputation: 20669

Pandas tries to align based on index/column names this is called Data Alignment, we can use .tolist here.

df.loc[2] = new_movie_row.tolist()
df
           Title  ReleaseYear          Director
0        Titanic         1997     James Cameron
1     Spider-Man         2002         Sam Raimi
2  Jurassic Park         1993  Steven Spielberg

This applies same for adding columns too

new_keyword_column = pd.Series(['Boat', 'Spider', 'Dinosaur'],index=[4,5,6])  # Notice the Index is 4, 5, 6.

df['new'] = new_keyword_column
df
           Title  ReleaseYear          Director  new
0        Titanic         1997     James Cameron  NaN
1     Spider-Man         2002         Sam Raimi  NaN
2  Jurassic Park         1993  Steven Spielberg  NaN

Since indexes don't align you get all NaN, to counter that you can use .tolist()

df['new'] = new_keyword_column.tolist()
df
           Title  ReleaseYear          Director       new
0        Titanic         1997     James Cameron      Boat
1     Spider-Man         2002         Sam Raimi    Spider
2  Jurassic Park         1993  Steven Spielberg  Dinosaur

Upvotes: 3

Related Questions