IamH1kc
IamH1kc

Reputation: 6762

How to use regex in python in getting a string between two characters?

I have this as my input

content = 'abc.zip'\n

I want to take out abc out of it . How do I do it using regex in python ?

Edit :

No this is not a homework question . I am trying to automate something and I am stuck at a certain point so that I can make the automate generic to any zip file I have .

os.system('python unzip.py -z data/ABC.zip -o data/')

After I take in the zip file , I unzip it . I am planning to make it generic , by getting the filename from the directory the zip file was put in and then provide the file name to the upper stated syntax to unzip it

Upvotes: 0

Views: 492

Answers (3)

rzetterberg
rzetterberg

Reputation: 10268

Edit: Changed the regexp to match for "content = 'abc.zip\n'" instead of the string "abc.zip".

import re 

#Matching for "content = 'abc.zip\n'"
matches = re.match("(?P<filename>.*).zip\n'$", "content = 'abc.zip\n'")
matches = matches.groupdict()
print matches

#Matching for "abc.zip"    
matches = re.match("(?P<filename>.*).zip$", "abc.zip")
matches = matches.groupdict()
print matches

Output:

{'filename': 'abc'}

This will print the matches of everything before .zip. You can access everything like a regular dictionary.

Upvotes: 1

Blair
Blair

Reputation: 15788

As I implied in my comment, regular expressions are unlikely to be the best tool for the job (unless there is some artificial restriction on the problem, or it is far more complex than your example). The standard string and/or path libraries provide functions which should do what you are after. To better illustrate how these work, I'll use the following definition of content instead:

>>> content = 'abc.def.zip'

If its a file, and you want the name and extension:

>>> import os.path
>>> filename, extension = os.path.splitext(content)
>>> print filename
abc.def
>>> print extension
.zip

If it is a string, and you want to remove the substring 'abc':

>>> noabc = content.replace('abc', '')
>>> print noabc
.def.zip

If you want to break it up on each occurrence of a period;

>>> broken = content.split('.')
>>> print broken
['abc', 'def', 'zip']

If it has multiple periods, and you want to break it on the first or last one:

>>> broken = content.split('.', 1)
>>> print broken
['abc', 'def.zip']
>>> broken = content.rsplit('.', 1)
>>> print broken
['abc.def', 'zip']

Upvotes: 4

Mr Fooz
Mr Fooz

Reputation: 111856

If you're trying to break up parts of a path, you may find the os.path module to be useful. It has nice abstractions with clear semantics that are easy to use.

Upvotes: 0

Related Questions