user996018
user996018

Reputation: 71

Editing a text file using python

I have an auto generated bibliography file which stores my references. The citekey in the generated file is of the form xxxxx:2009tb. Is there a way to make the program to detect such a pattern and change the citekey form to xxxxx:2009?

Upvotes: 0

Views: 483

Answers (2)

heltonbiker
heltonbiker

Reputation: 27575

You actually just want to remove the two letters after the year in a reference. Supposing we could uniquely identify a reference as a colon followed by four numbers and two letters, than the following regular expression would work (at least it is working in this example code):

import re

s = """
according to some works (newton:2009cb), gravity is not the same that
severity (darwin:1873dc; hampton:1956tr).
"""

new_s = re.sub('(:[0-9]{4})\w{2}', r'\1', s)
print new_s

Explanation: "match a colon : followed by four numbers [0-9]{4} followed by any two "word" characters \w{2}. The parentheses catch just the part you want to keep, and r'\1' means you are replacing each whole match by a smaller part of it which is in the first (and only) group of parentheses. The r before the string is there because it is necessary to interpret \1 as a raw string, and not as an escape sequence.

Hope this helps!

Upvotes: 0

RParadox
RParadox

Reputation: 6851

It's not quite clear to me which expression you want to match, but you can build everything with regex, using import re and re.sub as shown. [0-9]*4 matches exactly 4 numbers. (Edit, to incorporate suggestions)

import re                                                                                                                                                                                          

inf = 'temp.txt'                                                                                      
outf = 'out.txt'                                                                                      

with open(inf) as f,open(outf,'w') as o:                                                              
    all = f.read()                                                                                    
    all = re.sub("xxxxx:[0-9]*4tb","xxxxx:tb",all) # match your regex here                                                  
    o.write(all)                                                                                      
    o.close()                                  

Upvotes: 1

Related Questions