Reputation: 838
My data follow a repeated pattern in a text file. The same type data structure with unique values is printed out till the end of the file
{'AuthorSite': None,
'FirstText': None,
'Image': None,
'SrcDate': None,
'Title': None,
'Url': None}
...
..
.
I am trying to match each block one at a time using a regular expression in sublime text. I have tried a variety of forms with no success. The latest one being:
\{(.|\s)\}
I wanted to hoover up everything between each pair of braces. Please advise. I will eventually implement this in python.
Upvotes: 1
Views: 1862
Reputation: 67978
\{([^}]+)\}
You can try this demo:
http://regex101.com/r/hQ9xT1/32
import re
p = re.compile(ur'{([^}]+)}')
test_str = u"{'AuthorSite': None,\n 'FirstText': None,\n 'Image': None,\n 'SrcDate': None,\n 'Title': None,\n 'Url': None}"
re.findall(p, test_str)
Your regex \{(.|\s)\}
didn't work because you had not quantified it. Use \{(?:.|\s)+\}
.
Upvotes: 2
Reputation: 21
Assuming you want to retrieve the values, I would use the following regular expression
\{([^\}]+)\}
The key here is [^}] character class, which matches anything that isn't the literal } character. Whitespaces, border characters, letters, digits, etc.
Here is the Python code:
import re
hoover_exp = re.compile(r'\{([^\}]+)\}')
with(open('data.txt', 'r') as infile):
text = infile.read()
matches = hoover_exp.findall(text)
matches will be a list of all the non-overlapping matches in text. e.g.
["'AuthorSite': None,\n 'FirstText': None,\n 'Image': None,\n 'SrcDate': None,\n 'Title': None,\n 'Url': None", "'AuthorSite': None,\n 'FirstText': None,\n 'Image': None,\n 'SrcDate': None,\n 'Title': None,\n 'Url': None"]
That being said, if you input text is nothing but these dicts, you might be better off using something like json to pull them directly into Python dicts.
Upvotes: 1