Reputation: 31
I have a data frame with the following:
df=pd.DataFrame(['DMA.CSV','NaN' , 'AEB.csv', 'Xy.PY'],columns=['File_Name'])
What is the efficient way to get all the File_Names with extension converted to lower case (excluding NaN). The output should look like this:
['DMA.csv','NaN' , 'AEB.csv', 'Xy.py']
Upvotes: 0
Views: 237
Reputation: 31
After a lot of research, I found the following method, which is quite simple, I feel:
df['File_Name'] = [x.rsplit('.',1)[0]+'.'+x.rsplit('.',1)[-1].lower() if '.' in str(x)
else x for x in df['File_Name']]
This will exclude all the NaN values and also will take care of multiple dots ('.') in the file names (as 'Hello.World.TXT')
Upvotes: 0
Reputation: 828
You can try this:
import pandas as pd
df=pd.DataFrame(['DMA.CSV','NaN' , 'AEB.csv', 'Xy.PY'],columns=['File_Name'])
for i, v in enumerate(df['File_Name'].str.split('.')):
if len(v) == 2:
df.iloc[i] = v[0]+'.'+v[1].lower()
else:
df.iloc[i] = v[0]
print(df)
File_Name
0 DMA.csv
1 NaN
2 AEB.csv
3 Xy.py
Upvotes: 0
Reputation: 82755
Using os.path.splitext
Ex:
import pandas as pd
import os
df=pd.DataFrame(['Hello.world.txt', 'DMA.CSV','NaN' , 'AEB.csv', 'Xy.PY'],columns=['File_Name'])
df["File_Name"] = [ filename+ext.lower() if ext else filename for filename,ext in df["File_Name"].apply(os.path.splitext) ]
print(df)
Output:
File_Name
0 Hello.world.txt
1 DMA.csv
2 NaN
3 AEB.csv
4 Xy.py
Upvotes: 0
Reputation: 1582
You can also try this:
def lower_suffix(mystr):
if '.' in mystr:
return mystr[:mystr.rfind('.')]+mystr[mystr.rfind('.'):].lower()
else:
return mystr
df['File_Name'] = df['File_Name'].apply(lower_suffix)
print(df)
You are applying the function which finds, if it exists, the last '.' in your file name and replaces whatever comes afterwards by lowercase.
Upvotes: 0
Reputation: 3967
This one excludes 'NaN'
from the output:
df = df.File_Name.iloc[df[~df.File_Name.str.contains('NaN')].index].str.split('.', expand=True)
df.iloc[:,1] = df.iloc[:,1].str.lower()
df = df[0] + '.' + df[1]
Upvotes: 1