Reputation: 110143
I am trying to add one Column and one Row by using a pd.Series
object. Here is what I have so far:
import pandas as pd
df = pd.DataFrame([
{"Title": "Titanic", "ReleaseYear": 1997, "Director": "James Cameron"},
{"Title": "Spider-Man", "ReleaseYear": 2002, "Director": "Sam Raimi"}
])
# Add a new row
new_movie_row = pd.Series(['Jurassic Park', 1993, 'Steven Spielberg'])
df.loc[2] = new_row
# Add a new column
new_keyword_column = pd.Series(['Boat', 'Spider', 'Dinosaur'])
df['Keyword'] = new_keyword_column
df
This seems to add the Column fine, however the Row gives me all NaN
:
What would be the correct way to do this?
Upvotes: 0
Views: 1167
Reputation: 862591
If want add new row or column is used alignment (it means pandas try matching Series index values and DataFrame columns/rows, if no match get NaN
s for no matching values):
Your approach is good, only is necessary set same index values of Series
for new row:
# Add a new row
new_movie_row = pd.Series(['Jurassic Park', 1993, 'Steven Spielberg'], index=df.columns)
df.loc[2] = new_movie_row
If default index values of DataFrame then default index is same, but for general data is necessary too.
# Add a new column
new_keyword_column = pd.Series(['Boat', 'Spider', 'Dinosaur'], index=df.index)
df['Keyword'] = new_keyword_column
print (df)
Title ReleaseYear Director Keyword
0 Titanic 1997 James Cameron Boat
1 Spider-Man 2002 Sam Raimi Spider
2 Jurassic Park 1993 Steven Spielberg Dinosaur
But generally if need new row/column is possible use list or 1d array with same length (or scalar if need same values):
# Add a new row
df.loc[2] = ['Jurassic Park', 1993, 'Steven Spielberg']
# Add a new column
df['Keyword'] = ['Boat', 'Spider', 'Dinosaur']
# Add a new column with same values
df['same vals'] = 10
Why is necessary use Series and not only lists?
Only if some input data missing, then is necessary align by Series:
# Add a new row
new_movie_row = pd.Series(['Jurassic Park', 1993], index=['Title','ReleaseYear'])
df.loc[2] = new_movie_row
print (df)
Title ReleaseYear Director
0 Titanic 1997 James Cameron
1 Spider-Man 2002 Sam Raimi
2 Jurassic Park 1993 NaN
Or specify columns too:
df.loc[2, ['Title','ReleaseYear']] = ['Jurassic Park', 1993]
If use only list get error:
df.loc[3] = ['Jurassic Park', 1993]
print (df)
>ValueError: cannot set a row with mismatched columns
Upvotes: 3
Reputation: 20669
Pandas tries to align based on index/column names this is called Data Alignment
, we can use .tolist
here.
df.loc[2] = new_movie_row.tolist()
df
Title ReleaseYear Director
0 Titanic 1997 James Cameron
1 Spider-Man 2002 Sam Raimi
2 Jurassic Park 1993 Steven Spielberg
This applies same for adding columns too
new_keyword_column = pd.Series(['Boat', 'Spider', 'Dinosaur'],index=[4,5,6]) # Notice the Index is 4, 5, 6.
df['new'] = new_keyword_column
df
Title ReleaseYear Director new
0 Titanic 1997 James Cameron NaN
1 Spider-Man 2002 Sam Raimi NaN
2 Jurassic Park 1993 Steven Spielberg NaN
Since indexes don't align you get all NaN
, to counter that you can use .tolist()
df['new'] = new_keyword_column.tolist()
df
Title ReleaseYear Director new
0 Titanic 1997 James Cameron Boat
1 Spider-Man 2002 Sam Raimi Spider
2 Jurassic Park 1993 Steven Spielberg Dinosaur
Upvotes: 3