Reputation: 2616
Say I have a dataframe like below:
df = pd.DataFrame({0:['Hello World!']}) # here df could have more than one column of data as shown below
df = pd.DataFrame({0:['Hello World!'], 1:['Hello Mars!']}) # or df could have more than one row of data as shown below
df = pd.DataFrame({0:['Hello World!', 'Hello Mars!']})
and I also have a list of column names like below:
new_col_names = ['a','b','c','d'] # here, len(new_col_names) might vary like below
new_col_names = ['a','b','c','d','e'] # but we can always be sure that the len(new_col_names) >= len(df.columns)
Given that, how could I replace the column names in df
such that it results something like below:
df = pd.DataFrame({0:['Hello World!']})
new_col_names = ['a','b','c','d']
# result would be like this
a b c d
Hello World! (empty string) (empty string) (empty string)
df = pd.DataFrame({0:['Hello World!'], 1:['Hello Mars!']})
new_col_names = ['a','b','c','d']
# result would be like this
a b c d
Hello World! Hello Mars! (empty string) (empty string)
df = pd.DataFrame({0:['Hello World!', 'Hello Mars!']})
new_col_names = ['a','b','c','d','e']
a b c d e
Hello World! (empty string) (empty string) (empty string) (empty string)
Hellow Mars! (empty string) (empty string) (empty string) (empty string)
From reading around StackOverflow answers such as this, I have a vague idea that it could be something like below:
df[new_col_names] = '' # but this returns KeyError
# or this
df.columns=new_col_names # but this returns ValueError: Length mismatch (of course)
If someone could show me, a way to overwrite existing dataframe column name and at the same time add new data columns with empty string values in the rows, I'd greatly appreciate the help.
Upvotes: 1
Views: 1276
Reputation: 62393
import pandas as pd
# function
def rename_add_col(df: pd.DataFrame, cols: list) -> pd.DataFrame:
c_len = len(df.columns)
if c_len == len(cols):
df.columns = cols
else:
df.columns = cols[:c_len]
df = pd.concat([df, pd.DataFrame(columns=cols[c_len:])])
return df
# create dataframe
t1 = pd.DataFrame({'a': ['1', '2', '3'], 'b': ['4', '5', '6'], 'c': ['7', '8', '9']})
a b c
0 1 4 7
1 2 5 8
2 3 6 9
# call function
cols = ['d', 'e', 'f']
t1 = rename_add_col(t1, cols)
d e f
0 1 4 7
1 2 5 8
2 3 6 9
# call function
cols = ['g', 'h', 'i', 'new1', 'new2']
t1 = rename_add_col(t1, cols)
g h i new1 new2
0 1 4 7 NaN NaN
1 2 5 8 NaN NaN
2 3 6 9 NaN NaN
Upvotes: 2
Reputation: 862511
Idea is create dictionary by existing columns names by zip
, rename only existing columns and then add all new one by DataFrame.reindex
:
df = pd.DataFrame({0:['Hello World!', 'Hello Mars!']})
new_col_names = ['a','b','c','d','e']
df1 = (df.rename(columns=dict(zip(df.columns, new_col_names)))
.reindex(new_col_names, axis=1, fill_value=''))
print (df1)
a b c d e
0 Hello World!
1 Hello Mars!
df1 = (df.rename(columns=dict(zip(df.columns, new_col_names)))
.reindex(new_col_names, axis=1))
print (df1)
a b c d e
0 Hello World! NaN NaN NaN NaN
1 Hello Mars! NaN NaN NaN NaN
Upvotes: 3
Reputation: 61
Use your old Dataframe to recreate another dataframe with the pd.DataFrame() method and then add new columns in the columns paramater by list addition.
Note : This would add new columns as per index length, but with NaN values, workaround for which would be doing a df.fillna(' ')
pd.DataFrame(df.to_dict() , columns = list(df.columns)+['b','c'])
Hope this Helps! Cheers !
Upvotes: 1