Reputation: 7063
I have a column FileName
in pandas dataframe which consists of strings containing filenames of the form . The filename can contain dots('.') in them. For example, a a.b.c.d.txt
is a txt file. I just want to have another column FileType
column containing only the file extensions.
Sample DataFrame:
FileName
a.b.c.d.txt
j.k.l.exe
After processing:
FileName FileType
a.b.c.d.txt txt
j.k.l.exe exe
I tried the following:
X['FileType'] = X.FileName.str.split(pat='.')
This help me split the string on .
. But how do I get the last element i.e. the file extension?
Something like
X['FileType'] = X.FileName.str.split(pat='.')[-1]
X['FileType'] = X.FileName.str.split(pat='.').pop(-1)
did not give the desired output.
Upvotes: 6
Views: 7245
Reputation: 51165
Option 1
apply
df['FileType'] = df.FileName.apply(lambda x: x.split('.')[-1])
Option 2
Use str
twice
df['FileType'] = df.FileName.str.split('.').str[-1]
Option 2b
Use rsplit
(thanks @cᴏʟᴅsᴘᴇᴇᴅ)
df['FileType'] = df.FileName.str.rsplit('.', 1).str[-1]
All result in:
FileName FileType
0 a.b.c.d.txt txt
1 j.k.l.exe exe
Python 3.6.4, Pandas 0.22.0
Upvotes: 9
Reputation: 402303
If you don't want to split the extension from the filename, then I would recommend a list comprehension—
str.rsplit
df['FileType'] = [f.rsplit('.', 1)[-1] for f in df.FileName.tolist()]
df
FileName FileType
0 a.b.c.d.txt txt
1 j.k.l.exe exe
If you want to split the path and the filename, there are a couple of options.
os.path.splitext
import os
pd.DataFrame(
[os.path.splitext(f) for f in df.FileName],
columns=['Name', 'Type']
)
Name Type
0 a.b.c.d .txt
1 j.k.l .exe
str.extract
df.FileName.str.extract(r'(?P<FileName>.*)(?P<FileType>\..*)', expand=True)
Name Type
0 a.b.c.d .txt
1 j.k.l .exe
Upvotes: 3