Tom J
Tom J

Reputation: 31

Converting a part of a field of a data frame to a lower case [Pandas]

I have a data frame with the following:

df=pd.DataFrame(['DMA.CSV','NaN' , 'AEB.csv', 'Xy.PY'],columns=['File_Name'])

What is the efficient way to get all the File_Names with extension converted to lower case (excluding NaN). The output should look like this:

['DMA.csv','NaN' , 'AEB.csv', 'Xy.py']

Upvotes: 0

Views: 237

Answers (5)

Tom J
Tom J

Reputation: 31

After a lot of research, I found the following method, which is quite simple, I feel:

df['File_Name'] = [x.rsplit('.',1)[0]+'.'+x.rsplit('.',1)[-1].lower() if '.' in str(x) 
   else x for x in df['File_Name']]

This will exclude all the NaN values and also will take care of multiple dots ('.') in the file names (as 'Hello.World.TXT')

Upvotes: 0

heena bawa
heena bawa

Reputation: 828

You can try this:

import pandas as pd
df=pd.DataFrame(['DMA.CSV','NaN' , 'AEB.csv', 'Xy.PY'],columns=['File_Name'])
for i, v in enumerate(df['File_Name'].str.split('.')):
    if len(v) == 2:
        df.iloc[i] = v[0]+'.'+v[1].lower()
    else:
        df.iloc[i] = v[0]

print(df)

  File_Name
0   DMA.csv
1       NaN
2   AEB.csv
3     Xy.py

Upvotes: 0

Rakesh
Rakesh

Reputation: 82755

Using os.path.splitext

Ex:

import pandas as pd
import os

df=pd.DataFrame(['Hello.world.txt', 'DMA.CSV','NaN' , 'AEB.csv', 'Xy.PY'],columns=['File_Name'])
df["File_Name"] = [ filename+ext.lower() if ext else filename for filename,ext in df["File_Name"].apply(os.path.splitext) ]
print(df)

Output:

         File_Name
0  Hello.world.txt
1          DMA.csv
2              NaN
3          AEB.csv
4            Xy.py

Upvotes: 0

Matina G
Matina G

Reputation: 1582

You can also try this:

def lower_suffix(mystr):
    if '.' in mystr:
        return mystr[:mystr.rfind('.')]+mystr[mystr.rfind('.'):].lower()
    else:
        return mystr

df['File_Name'] = df['File_Name'].apply(lower_suffix)
print(df)

You are applying the function which finds, if it exists, the last '.' in your file name and replaces whatever comes afterwards by lowercase.

Upvotes: 0

meW
meW

Reputation: 3967

This one excludes 'NaN' from the output:

df = df.File_Name.iloc[df[~df.File_Name.str.contains('NaN')].index].str.split('.', expand=True)
df.iloc[:,1] = df.iloc[:,1].str.lower()
df = df[0] + '.' + df[1]

Upvotes: 1

Related Questions