Rosellx
Rosellx

Reputation: 147

Intepreting Pandas Column Referencing Syntax

I have a basic background in using R for data wrangling but am new to Python. I came across this code snippet from a tutorial on Coursera.

Can someone please explain to me what columns ={col:'Gold' + col[4:]}, inplace = True means?

(1) From my understanding, df.rename is to rename the existing column name to (in the case of first line, Gold) but why is there a need to +col[4:] after it?

(2) Does declaring the function inplace as True mean to assign the resulting df output to the original df?

import pandas as pd

df = pd.read_csv('olympics.csv', index_col=0, skiprows=1)

for col in df.columns:
    if col[:2]=='01':
        df.rename(columns={col:'Gold'+col[4:]}, inplace=True)
    if col[:2]=='02':
        df.rename(columns={col:'Silver'+col[4:]}, inplace=True)
    if col[:2]=='03':
        df.rename(columns={col:'Bronze'+col[4:]}, inplace=True)
    if col[:1]=='№':
        df.rename(columns={col:'#'+col[1:]}, inplace=True)

Thank you in advance.

Upvotes: 1

Views: 130

Answers (3)

Peter Kulik
Peter Kulik

Reputation: 299

inplace=True means: The columns will be renamed in your original dataframe (df)

Your case (inplace=True):

import pandas as pd

df = pd.DataFrame(columns={"A": [1, 2, 3], "B": [4, 5, 6]})
df.rename(columns={"A": "a", "B": "c"}, inplace=True)

print(df.columns)
# Index(['a', 'c'], dtype='object')
# df already has the renamed columns, because inplace=True.

If you wouldn't use inplace=True, then the rename method would generate a new dataframe, like this:

import pandas as pd
df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
new_frame = df.rename(columns={"A": "a", "B": "c"})

print(df.columns) 
# Index(['A', 'B'], dtype='object') 
# It contains the old column names

print(new_frame.columns)
# Index(['a', 'c'], dtype='object') 
# It's a new dataframe and has renamed columns

NOTE: In this case, better approach to assign the new dataframe to the original dataframe (df)

df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
df = df.rename(columns={"A": "a", "B": "c"})

Upvotes: 0

yganalyst
yganalyst

Reputation: 494

if col[:2]=='01':
        #replace column name with text gold and all characters after 4th letter
        df.rename(columns={col:'Gold'+col[4:]}, inplace=True)

(1). If col has a column name of '01xx1234',
1. col[:2] = 01 is True
2. 'Gold'+col[4:] => 'Gold'+col[4:] => 'Gold1234'
3. so, '01xx1234' is replaced by 'Gold1234'.

(2) inplace = True applies directly to a dataframe and does not return a result.
If you do not add this option, you have to do like this.
df = df.rename(columns={col:'Gold'+col[4:]})

Upvotes: 0

jezrael
jezrael

Reputation: 863351

It means:

#for each column name
for col in df.columns:
    #check first 2 chars for 01
    if col[:2]=='01':
        #replace column name with text gold and all characters after 4th letter
        df.rename(columns={col:'Gold'+col[4:]}, inplace=True)
    #similar like above
    if col[:2]=='02':
        df.rename(columns={col:'Silver'+col[4:]}, inplace=True)
    #similar like above
    if col[:2]=='03':
        df.rename(columns={col:'Bronze'+col[4:]}, inplace=True)
    #check first letter
    if col[:1]=='№':
        #add # after first letter
        df.rename(columns={col:'#'+col[1:]}, inplace=True)

Does declaring the function inplace as True mean to assign the resulting df output to the original dataframe

Yes, you are right. It replace inplace columns names.

Upvotes: 0

Related Questions