Reputation: 3
import shlex
fil=open("./demoshlex.txt",'r')
line=fil.readline()
print line
print shlex.split(line)
suppose my sting is as below in a text file
line1 :
asfdsafadfa "Tabvxc "avcx"sdasaf" sadasfdf. sdsadsaf '0000000000000000000000000000000'." is something
I want to split the line and form list as follows
[asfdsafadfa, "Tabvxc "avcx"sdasaf" sadasfdf. sdsadsaf '0000000000000000000000000000000'.", is something]
i tried using shlex.split
but it gave me exception, putting code and exception
**Output:**
python basicshelx.py
asfdsafadfa "Tabvxc "avcx"sdasaf" sadasfdf. sdsadsaf '0000000000000000000000000000000'."
Traceback (most recent call last):
File "basicshelx.py", line 5, in <module>
print shlex.split(line)
File "/home/siddhant/sid/.local/lib/python2.7/shlex.py", line 279, in split
return list(lex)
File "/home/siddhant/sid/.local/lib/python2.7/shlex.py", line 269, in next
token = self.get_token()
File "/home/siddhant/sid/.local/lib/python2.7/shlex.py", line 96, in get_token
raw = self.read_token()
File "/home/siddhant/sid/.local/lib/python2.7/shlex.py", line 172, in read_token
raise ValueError, "No closing quotation"
ValueError: No closing quotation
Upvotes: 0
Views: 9613
Reputation: 2488
It seems to me that you want to split only on the first occurance of "
and want to keep all "
in the second element of your output list.
Here is an example using just standard libraries, no import needed:
result = []
with open('test.txt', 'r') as openfile:
for line in openfile:
# strip spaces and \n from the line
line = line.strip()
# split the line on "
my_list = line.split('"')
# only append first element of the list to the result
result.append(my_list[0].strip())
# rebuild the second part, adding back in the "
remainder = '"' + '"'.join([a for a in my_list[1:]])
# append the second part to the result
result.append(remainder)
print(result)
output:
['asfdsafadfa', '"Tabvxc "avcx"sdasaf" sadasfdf. sdsadsaf \'0000000000000000000000000000000\'."']
or if you print the individual elements of the output list:
for e in result:
print(e)
output:
asfdsafadfa
"Tabvxc "avcx"sdasaf" sadasfdf. sdsadsaf '0000000000000000000000000000000'."
[Edit based on comment]
As per comments you can use .split('"', 1)
, example:
with open('test.txt', 'r') as openfile:
for line in openfile:
# strip spaces and \n from the line
line = line.strip()
# split the line on " but only the fist one
result = line.split('"', 1)
# add in the " for the second element
result[1] = '"' + result[1]
[Edit based on updated question and comment]
Comment from OP:
I want only the quoted part i.e remove "is something" from that element of result List and make it [2] element
As the question is updated with a trailing "is something" string on the input, which need to be omitted in the output, the example now becomes as follows:
with open('test.txt', 'r') as openfile:
for line in openfile:
# strip spaces and \n from the line
line = line.strip()
# split the line on " but only the fist one
result = line.split('"', 1)
# add in the " for the second element, remove trailing string
result[1] = '"{}"'.format(result[1].rsplit('"', 1)[0])
however a file is likely to contain multiple lines, if this is the case you need to build up a list of outputs, one output for each line. The example now becomes as follows:
result = []
with open('test.txt', 'r') as openfile:
for line in openfile:
if '"' in line:
# we can split the line on "
line = line.strip().split('"', 1)
if line[1][-1] == '"':
# no trailing string to remove
# pre-fix second element with "
line[1] = '"{}'.format(line[1])
elif '"' in line[1]:
# trailing string to be removed with .rsplit()[0]
# post- and pre-fix " for second element
line[1] = '"{}"'.format(line[1].rsplit('"', 1)[0])
else:
# no " in line, return line as one element list
line = [line.strip()]
result.append(line)
# result is now a list of lists
for line in result:
for e in line:
print(e)
Upvotes: 1
Reputation: 7058
Best way would be to use re
s = '''asfdsafadfa "Tabvxc "avcx"sdasaf" sadasfdf. sdsadsaf '0000000000000000000000000000000'." is something'''''
pat = re.compile(
r'''
^ # beginning of a line
(.*?) # first part. the *? means non-greedy
(".*") # part between the outermost ", ("-included)
(.*?) # last part
$ # end of a line
''', re.DOTALL|re.VERBOSE)
pat.match(s).groups()
('asfdsafadfa ',
'"Tabvxc "avcx"sdasaf" sadasfdf. sdsadsaf \'0000000000000000000000000000000\'."',
' is something')
so in total this would become:
test_str = '''asfdsafadfa "Tabvxc "avcx"sdasaf" sadasfdf. sdsadsaf '0000000000000000000000000000000'." is something
asfdsafadfa "Tabvxc "avcx"sdasaf" sadasfdf. sdsadsaf '0000000000000000000000000000000'."
asfdsafadfa Tabvxc avcxsdasaf sadasfdf. sdsadsaf '0000000000000000000000000000000'.
'''
def split_lines(filehandle):
pat = re.compile(r'''^(.*?)(".*")(.*?)$''', re.DOTALL)
for line in filehandle:
match = pat.match(line)
if match:
yield match.groups()
else:
yield line
with StringIO(test_str) as openfile:
for line in split_lines(openfile):
print(line)
The first generator splits the open filehandle in different lines. Then it tries to split the line. If it succeeds, it yields a tuple with the different parts, otherwise it yields the original string.
In your actual programs you can replace the StringIO(test_str)
with open(filename, 'r')
('asfdsafadfa ', '"Tabvxc "avcx"sdasaf" sadasfdf. sdsadsaf \'0000000000000000000000000000000\'."', ' is something') ('asfdsafadfa ', '"Tabvxc "avcx"sdasaf" sadasfdf. sdsadsaf \'0000000000000000000000000000000\'."', '') asfdsafadfa Tabvxc avcxsdasaf sadasfdf. sdsadsaf '0000000000000000000000000000000'.
Upvotes: 1
Reputation: 1037
Your original string seems badly quoted to start with. You can escape quotes by preceding them with a \ like so :
my_var = "Tabvxc \"avcx\"sdasaf\" sadasfdf. sdsadsaf '0000000000000000000000000000000'."
You can then proceed with splitting it like so :
my_var.split('"')
Upvotes: 0