Jannat Arora
Jannat Arora

Reputation: 2989

Retaining split characters

I have the following data:

<http://dbpedia.org/data/Plasmodium_hegneri.xml> <http://code.google.com/p/ldspider/ns#headerInfo> _:header16125770191335188966549 <http://dbpedia.org/data/Plasmodium_hegneri.xml> .
_:header16125770191335188966549 <http://www.w3.org/2006/http#responseCode> "200"^^<http://www.w3.org/2001/XMLSchema#integer> <http://dbpedia.org/data/Plasmodium_hegneri.xml> .
_:header16125770191335188966549 <http://www.w3.org/2006/http#date> "Mon, 23 Apr 2012 13:49:27 GMT" <http://dbpedia.org/data/Plasmodium_hegneri.xml> .
_:header16125770191335188966549 <http://www.w3.org/2006/http#content-type> "application/rdf+xml; charset=UTF-8" <http://dbpedia.org/data/Plasmodium_hegneri.xml> .

Now I want to transform this data to the following form -- such that the last string enclosed in < > appears before the line in which it appears with #@ added.

#@ <http://dbpedia.org/data/Plasmodium_hegneri.xml>
<http://dbpedia.org/data/Plasmodium_hegneri.xml> <http://code.google.com/p/ldspider/ns#headerInfo> _:header16125770191335188966549 .
#@ <http://dbpedia.org/data/Plasmodium_hegneri.xml>
_:header16125770191335188966549 <http://www.w3.org/2006/http#responseCode> "200"^^<http://www.w3.org/2001/XMLSchema#integer> .
#@ <http://dbpedia.org/data/Plasmodium_hegneri.xml>
_:header16125770191335188966549 <http://www.w3.org/2006/http#date> "Mon, 23 Apr 2012 13:49:27 GMT" .
#@ <http://dbpedia.org/data/Plasmodium_hegneri.xml>
_:header16125770191335188966549 <http://www.w3.org/2006/http#content-type> "application/rdf+xml; charset=UTF-8" .

I wrote the following python code in order to do the same:

infile = open('testnq.nq', 'r')
outfile= open('outFile.ttl','w')
while True:
    inFileLine1=infile.readline()
    if not inFileLine1:
        break #EOF
    splitString=inFileLine1.split(' ')
    line1= "#@ " + splitString[len(splitString)-2]
    outfile.write(line1)
    line2=""
    for num in range (0,len(splitString)-2):
        line2= line2 + splitString[num]
    outFile.write(line2)

outFile.close()

But I am not able to obtain the spaces at the desired places. Can someone please suggest how can I do the same in python or using linux commands

Upvotes: 1

Views: 59

Answers (1)

user707650
user707650

Reputation:

With the risk of using a regular expression and complicating things, this may work:

import re

line = """<http://dbpedia.org/data/Plasmodium_hegneri.xml> <http://code.google.com/p/ldspider/ns#headerInfo> _:header16125770191335188966549 <http://dbpedia.org/data/Plasmodium_hegneri.xml> ."""
print re.sub('^(?P<before>.*)(?P<match>\<[^>]+\>)(?P<after>[^<]*)$', '#@ \g<match>\n\g<before>\g<after>', line)

line = """_:header16125770191335188966549 <http://www.w3.org/2006/http#responseCode> "200"^^<http://www.w3.org/2001/XMLSchema#integer> <http://dbpedia.org/data/Plasmodium_hegneri.xml> ."""
print re.sub('^(?P<before>.*)(?P<match>\<[^>]+\>)(?P<after>[^<]*)$', '#@ \g<match>\n\g<before>\g<after>', line)

which outputs:

#@ <http://dbpedia.org/data/Plasmodium_hegneri.xml>
<http://dbpedia.org/data/Plasmodium_hegneri.xml> <http://code.google.com/p/ldspider/ns#headerInfo> _:header16125770191335188966549  .
#@ <http://dbpedia.org/data/Plasmodium_hegneri.xml>
_:header16125770191335188966549 <http://www.w3.org/2006/http#responseCode> "200"^^<http://www.w3.org/2001/XMLSchema#integer>  .

Upvotes: 1

Related Questions