Reputation: 147
I have a basic background in using R for data wrangling but am new to Python. I came across this code snippet from a tutorial on Coursera.
Can someone please explain to me what columns ={col:'Gold' + col[4:]}, inplace = True means?
(1) From my understanding, df.rename is to rename the existing column name to (in the case of first line, Gold) but why is there a need to +col[4:] after it?
(2) Does declaring the function inplace as True mean to assign the resulting df output to the original df?
import pandas as pd
df = pd.read_csv('olympics.csv', index_col=0, skiprows=1)
for col in df.columns:
if col[:2]=='01':
df.rename(columns={col:'Gold'+col[4:]}, inplace=True)
if col[:2]=='02':
df.rename(columns={col:'Silver'+col[4:]}, inplace=True)
if col[:2]=='03':
df.rename(columns={col:'Bronze'+col[4:]}, inplace=True)
if col[:1]=='№':
df.rename(columns={col:'#'+col[1:]}, inplace=True)
Thank you in advance.
Upvotes: 1
Views: 130
Reputation: 299
inplace=True means: The columns will be renamed in your original dataframe (df)
Your case (inplace=True):
import pandas as pd
df = pd.DataFrame(columns={"A": [1, 2, 3], "B": [4, 5, 6]})
df.rename(columns={"A": "a", "B": "c"}, inplace=True)
print(df.columns)
# Index(['a', 'c'], dtype='object')
# df already has the renamed columns, because inplace=True.
If you wouldn't use inplace=True, then the rename method would generate a new dataframe, like this:
import pandas as pd
df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
new_frame = df.rename(columns={"A": "a", "B": "c"})
print(df.columns)
# Index(['A', 'B'], dtype='object')
# It contains the old column names
print(new_frame.columns)
# Index(['a', 'c'], dtype='object')
# It's a new dataframe and has renamed columns
NOTE: In this case, better approach to assign the new dataframe to the original dataframe (df)
df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
df = df.rename(columns={"A": "a", "B": "c"})
Upvotes: 0
Reputation: 494
if col[:2]=='01':
#replace column name with text gold and all characters after 4th letter
df.rename(columns={col:'Gold'+col[4:]}, inplace=True)
(1). If col has a column name of '01xx1234',
1. col[:2] = 01 is True
2. 'Gold'+col[4:] => 'Gold'+col[4:] => 'Gold1234'
3. so, '01xx1234' is replaced by 'Gold1234'.
(2) inplace = True
applies directly to a dataframe and does not return a result.
If you do not add this option, you have to do like this.
df = df.rename(columns={col:'Gold'+col[4:]})
Upvotes: 0
Reputation: 863351
It means:
#for each column name
for col in df.columns:
#check first 2 chars for 01
if col[:2]=='01':
#replace column name with text gold and all characters after 4th letter
df.rename(columns={col:'Gold'+col[4:]}, inplace=True)
#similar like above
if col[:2]=='02':
df.rename(columns={col:'Silver'+col[4:]}, inplace=True)
#similar like above
if col[:2]=='03':
df.rename(columns={col:'Bronze'+col[4:]}, inplace=True)
#check first letter
if col[:1]=='№':
#add # after first letter
df.rename(columns={col:'#'+col[1:]}, inplace=True)
Does declaring the function inplace as True mean to assign the resulting df output to the original dataframe
Yes, you are right. It replace inplace columns names.
Upvotes: 0