Reputation: 35
I am trying to iterate through the items in python and remove the timestamp but keep the extension
for item in items:
print(item.split('_')[0])
Although this works but it deletes the extension as well. This how the string looks like dataset_2020-01-05.txt and this how i need it to be dataset.txt or dataset_2020-01-05.zip -> dataset.zip
I also tried this way
for item in items:
print(item.split('_')[0] + item.split('.')[-1])
but there are some files that doesn't have timestamp and it appends .txt to those files as well, so i ended up having something like dataset.txt.txt
Upvotes: 0
Views: 198
Reputation: 297
To remove, match the date expression using the re module, and remove from the items array.
import re
items = ["dataset_2020-01-05.txt", "dataset_2020-01-05.zip", "dataset.txt"]
for i, item in enumerate(items):
match = re.search(r'_\d{4}-\d{2}-\d{2}', item)
if(match):
items[i] = item.replace(match.group(), '')
print(items)
Output
['dataset.txt', 'dataset.zip', 'dataset.txt']
Upvotes: 0
Reputation: 27404
You can utilise the RE module to help with this. For example:
import re
print(re.sub('[0-9_-]', '', 'dataset_2020-01-05.txt'))
Output:
dataset.txt
Upvotes: 0
Reputation: 1
I would say if you have a date range, then maybe check if the date is present, and if it is present then apply the logic.
for example: if all your files contain '2020'
check
if '2020' in items
Upvotes: 0
Reputation: 91
for item in items:
front, ext = item.split('.')
print(front.split('_')[0] + '.' + ext)
or
for item in items:
ext = item.split('.')[-1]
front = item.split('.')[0].split('_')[0]
print(front + '.' + ext)
Upvotes: 1