Reputation: 8470
I need to extract just the numbers from file names such as:
GapPoints1.shp
GapPoints23.shp
GapPoints109.shp
How can I extract just the numbers from these files using Python? I'll need to incorporate this into a for
loop.
Upvotes: 15
Views: 28994
Reputation: 9
Hear is my code I used to bring the published year of a paper to the first of filename, after the file is downloaded from google scholar. The main files usually are constructed so: Author+publishedYear.pdf hence, by implementing this code the filename will become: PublishedYear+Author.pdf.
# Renaming Pdf according to number extraction
# You want to rename a pdf file, so the digits of document published year comes first.
# Use regular expersion
# As long as you implement this file, the other pattern will be accomplished to your filename.
# import libraries
import re
import os
# Change working directory to this folder
address = os.getcwd ()
os.chdir(address)
# defining a class with two function
class file_name:
# Define a function to extract any digits
def __init__ (self, filename):
self.filename = filename
# Because we have tow pattern, we must define tow function.
# First function for pattern as : schrodinger1990.pdf
def number_extrction_pattern_non_digits_first (filename):
pattern = (r'(\D+)(\d+)(\.pdf)')
digits_pattern_non_digits_first = re.search(pattern, filename, re.IGNORECASE).group (2)
non_digits_pattern_non_digits_first = re.search(pattern, filename, re.IGNORECASE).group (1)
return digits_pattern_non_digits_first, non_digits_pattern_non_digits_first
# Second function for pattern as : 1993schrodinger.pdf
def number_extrction_pattern_digits_first (filename):
pattern = (r'(\d+)(\D+)(\.pdf)')
digits_pattern_digits_first = re.search(pattern, filename, re.IGNORECASE).group (1)
non_digits_pattern_digits_first = re.search(pattern, filename, re.IGNORECASE).group (2)
return digits_pattern_digits_first, non_digits_pattern_digits_first
if __name__ == '__main__':
# Define a pattern to check filename pattern
pattern_check1 = (r'(\D+)(\d+)(\.pdf)')
# Declare each file address.
for filename in os.listdir(address):
if filename.endswith('.pdf'):
if re.search(pattern_check1, filename, re.IGNORECASE):
digits = file_name.number_extrction_pattern_non_digits_first (filename)[0]
non_digits = file_name.number_extrction_pattern_non_digits_first (filename)[1]
os.rename(filename, digits + non_digits + '.pdf')
# Else other pattern exists.
else :
digits = file_name.number_extrction_pattern_digits_first (filename)[0]
non_digits = file_name.number_extrction_pattern_digits_first (filename)[1]
os.rename(filename, digits + non_digits + '.pdf')
Upvotes: 0
Reputation: 3361
If there is just one number:
filter(lambda x: x.isdigit(), filename)
Upvotes: 5
Reputation: 309919
you can use regular expressions:
regex = re.compile(r'\d+')
Then to get the strings that match:
regex.findall(filename)
This will return a list of strings which contain the numbers. If you actually want integers, you could use int
:
[int(x) for x in regex.findall(filename)]
If there's only 1 number in each filename, you could use regex.search(filename).group(0)
(if you're certain that it will produce a match). If no match is found, the above line will produce a AttributeError saying that NoneType
has not attribute group
.
Upvotes: 28
Reputation: 17052
So, you haven't left any description of where these files are and how you're getting them, but I assume you'd get the filenames using the os
module.
As for getting the numbers out of the names, you'd be best off using regular expressions with re
, something like this:
import re
def get_numbers_from_filename(filename):
return re.search(r'\d+', filename).group(0)
Then, to include that in a for loop, you'd run that function on each filename:
for filename in os.listdir(myfiledirectory):
print get_numbers_from_filename(filename)
or something along those lines.
Upvotes: 5