Ahmad KAmolov
Ahmad KAmolov

Reputation: 35

Python remove digits in the middle of the string

I am trying to iterate through the items in python and remove the timestamp but keep the extension

for item in items:
    print(item.split('_')[0])

Although this works but it deletes the extension as well. This how the string looks like dataset_2020-01-05.txt and this how i need it to be dataset.txt or dataset_2020-01-05.zip -> dataset.zip

I also tried this way

for item in items:
        print(item.split('_')[0] + item.split('.')[-1])

but there are some files that doesn't have timestamp and it appends .txt to those files as well, so i ended up having something like dataset.txt.txt

Upvotes: 0

Views: 198

Answers (4)

oflint_
oflint_

Reputation: 297

To remove, match the date expression using the re module, and remove from the items array.

import re
items = ["dataset_2020-01-05.txt", "dataset_2020-01-05.zip", "dataset.txt"]
for i, item in enumerate(items):
    match = re.search(r'_\d{4}-\d{2}-\d{2}', item)
    if(match):
        items[i] = item.replace(match.group(), '')
print(items)

Output

['dataset.txt', 'dataset.zip', 'dataset.txt']

Upvotes: 0

Adon Bilivit
Adon Bilivit

Reputation: 27404

You can utilise the RE module to help with this. For example:

import re

print(re.sub('[0-9_-]', '', 'dataset_2020-01-05.txt'))

Output:

dataset.txt

Upvotes: 0

ahthserhsluk
ahthserhsluk

Reputation: 1

I would say if you have a date range, then maybe check if the date is present, and if it is present then apply the logic.

for example: if all your files contain '2020' check
if '2020' in items

Upvotes: 0

Julius
Julius

Reputation: 91

for item in items:
        front, ext = item.split('.')
        print(front.split('_')[0] + '.' + ext)

or

for item in items:
        ext = item.split('.')[-1]
        front = item.split('.')[0].split('_')[0]
        print(front + '.' + ext)

Upvotes: 1

Related Questions