Reputation: 2117
I am working on pulling out anything after the second period going from right to left. Some dont have two periods so it would be just the last part. Others have multiple periods. Is there a clever regex way to accomplish this?
df
file_name
0 image001.png
1 image002.jpg
2 image003.jpg
3 1234_001.pdf
4 machine datasheet.pdf
5 asdf_101010101.xlsx
6 not_malicious.docx.pdf
7 example.txt.scf
8 place 1010 - wiki edits.pdf
9 I LOVE YOU.TXT.vbs
10 test.test.read_this.pdf
Desired output:
df
file_name
0 png
1 jpg
2 jpg
3 pdf
4 pdf
5 xlsx
6 docx.pdf
7 txt.scf
8 pdf
9 TXT.vbs
10 read_this.pdf
Upvotes: 2
Views: 42
Reputation: 8631
You need to split file_name
with .
and then return last two from the list if the list size exceed 2, otherwise return last element.
df['file_name'].str.split('.').apply(lambda x: '.'.join(x[-2:]) if len(x)>2 else x[-1])
Output:
0 png
1 jpg
2 jpg
3 pdf
4 pdf
5 xlsx
6 docx.pdf
7 txt.scf
8 pdf
9 TXT.vbs
10 read_this.pdf
Upvotes: 3