Shridhar R Kulkarni
Shridhar R Kulkarni

Reputation: 7063

Extracting the file extensions from file names in pandas

I have a column FileName in pandas dataframe which consists of strings containing filenames of the form . The filename can contain dots('.') in them. For example, a a.b.c.d.txt is a txt file. I just want to have another column FileType column containing only the file extensions.

Sample DataFrame:

FileName

a.b.c.d.txt

j.k.l.exe

After processing:

FileName    FileType

a.b.c.d.txt txt

j.k.l.exe   exe

I tried the following:

X['FileType'] = X.FileName.str.split(pat='.')

This help me split the string on .. But how do I get the last element i.e. the file extension?

Something like

X['FileType'] = X.FileName.str.split(pat='.')[-1]

X['FileType'] = X.FileName.str.split(pat='.').pop(-1)

did not give the desired output.

Upvotes: 6

Views: 7245

Answers (2)

user3483203
user3483203

Reputation: 51165

Option 1
apply

df['FileType'] = df.FileName.apply(lambda x: x.split('.')[-1])

Option 2
Use str twice

df['FileType'] = df.FileName.str.split('.').str[-1]

Option 2b
Use rsplit (thanks @cᴏʟᴅsᴘᴇᴇᴅ)

df['FileType'] = df.FileName.str.rsplit('.', 1).str[-1]

All result in:

      FileName FileType
0  a.b.c.d.txt      txt
1    j.k.l.exe      exe

Python 3.6.4, Pandas 0.22.0

Upvotes: 9

cs95
cs95

Reputation: 402303

If you don't want to split the extension from the filename, then I would recommend a list comprehension—

comprehension with str.rsplit

df['FileType'] = [f.rsplit('.', 1)[-1] for f in df.FileName.tolist()]
df

      FileName FileType
0  a.b.c.d.txt      txt
1    j.k.l.exe      exe

If you want to split the path and the filename, there are a couple of options.

os.path.splitext

import os

pd.DataFrame(
    [os.path.splitext(f) for f in df.FileName], 
    columns=['Name', 'Type']
)
 
      Name  Type
0  a.b.c.d  .txt
1    j.k.l  .exe

str.extract

df.FileName.str.extract(r'(?P<FileName>.*)(?P<FileType>\..*)', expand=True)

      Name  Type
0  a.b.c.d  .txt
1    j.k.l  .exe

Upvotes: 3

Related Questions