sectechguy
sectechguy

Reputation: 2117

Pandas right to left pulling out partial string after second . else first . if 2 dont exist

I am working on pulling out anything after the second period going from right to left. Some dont have two periods so it would be just the last part. Others have multiple periods. Is there a clever regex way to accomplish this?

df
    file_name
0   image001.png 
1   image002.jpg
2   image003.jpg
3   1234_001.pdf
4   machine datasheet.pdf
5   asdf_101010101.xlsx
6   not_malicious.docx.pdf
7   example.txt.scf
8   place 1010 - wiki edits.pdf
9   I LOVE YOU.TXT.vbs
10  test.test.read_this.pdf 

Desired output:

df
    file_name
0   png 
1   jpg
2   jpg
3   pdf
4   pdf
5   xlsx
6   docx.pdf
7   txt.scf
8   pdf
9   TXT.vbs
10  read_this.pdf 

Upvotes: 2

Views: 42

Answers (1)

harpan
harpan

Reputation: 8631

You need to split file_name with . and then return last two from the list if the list size exceed 2, otherwise return last element.

df['file_name'].str.split('.').apply(lambda x: '.'.join(x[-2:]) if len(x)>2 else x[-1])

Output:

0               png
1               jpg
2               jpg
3               pdf
4               pdf
5              xlsx
6          docx.pdf
7           txt.scf
8               pdf
9           TXT.vbs
10    read_this.pdf

Upvotes: 3

Related Questions