Reputation: 115
I'm trying to split multiple lines of a segment from an ttl document, here's the relevant code.
entry_obj = str(Entry(*re.findall(r'([;\s]+[^\s+|\s+$])', ''.join(buf))))
yield process_entry_obj(entry_obj)
The code returns the error and as it is not able to split the string, the number of matching arguments are different every time and code doesn't run.
Below is my file format:
File input
## http://www.example.com/abc#AAA
pms:ecCreatedBy rms:type ;
rmfs:lag "Ersteller"@newyork ,
"AAA"@wdc .
There are multiple entries like above in the file.
Upvotes: 0
Views: 491
Reputation: 626690
You may use
import re
s = "" # File contents
with open(filepath, 'r') as fr:
s =fr.read()
s = re.sub(r'(?m)(rmfs:label\s*)("[^"]*"@(?!en)\w*)(\s*,\s*)("[^"]*"@en) \.$', r'\1\4\3\2 .', s)
s = re.sub(r'(?m)^(\s*###\s*http.*/v\d+#)\w*((?:\n(?!\n).*)*rmfs:label\s*")([^"]*)("@en)', r'\1\3\2\3\4', s)
# Wrtie to file:
with open(filepath, 'w') as fw:
fw.write(s)
See the Python demo.
Here are the Regex 1 and Regex 2 demos.
Regex 1 details
(?m)
- multiline mode, $
will match end of a line (rmfs:label\s*)
- Group 1 (\1
): rmfs:label
and then 0+ whitespaces("[^"]*"@(?!en)\w*)
- Group 2 (\2
): "
, 0+ non-"
chars, "@
, a lookahead check ensuring no en
immediately to the right of the current position, and then 0+ word chars(\s*,\s*)
- Group 3 (\3
): a ,
enclosed with 0+ whitespaces("[^"]*"@en)
- Group 4 (\4
): "
, 0+ chars other than "
, "
and @en
.$
- space, .
, end of line.Regex 2 details
(?m)
- multiline mnode, ^
matche line start^
- start of a line(\s*###\s*http.*/v\d+#)
- Group 1: 0+ whitespaces, ###
, 0+ whitespaces, http
, any 0+ chars, /v
, 1+ digits and #
\w*
- 0+ word chars((?:\n(?!\n).*)*rmfs:label\s*")
- Group 2: any amount of lines before a double line break ((?:\n(?!\n).*)*
) and then rmfs:label
, 0+ whitespaces and "
([^"]*)
- Group 3: any 0+ chars other than "
("@en)
- Group 4: "@en
siubstring.Upvotes: 1
Reputation: 37337
From what I understand you need \s*;\s*
Explanation:
\s*
- match whitespace character zero or more times
;
- match ;
literally
Upvotes: 1