user2377057
user2377057

Reputation: 183

Python split string on quotes

I'm a python learner. If I have a lines of text in a file that looks like this

"Y:\DATA\00001\SERVER\DATA.TXT" "V:\DATA2\00002\SERVER2\DATA2.TXT"

Can I split the lines around the inverted commas? The only constant would be their position in the file relative to the data lines themselves. The data lines could range from 10 to 100+ characters (they'll be nested network folders). I cannot see how I can use any other way to do those markers to split on, but my lack of python knowledge is making this difficult. I've tried

optfile=line.split("")

and other variations but keep getting valueerror: empty seperator. I can see why it's saying that, I just don't know how to change it. Any help is, as always very appreciated.

Many thanks

Upvotes: 9

Views: 43222

Answers (10)

Frank from Frankfurt
Frank from Frankfurt

Reputation: 218

The following code splits the line at each occurrence of the inverted comma character (") and removes empty strings and those consisting only of whitespace.

[s for s in line.split('"') if s.strip() != '']

There is no need to use regular expressions, an escape character, some module or assume a certain number of whitespace characters between the paths.

Test:

line = r'"Y:\DATA\00001\SERVER\DATA.TXT" "V:\DATA2\00002\SERVER2\DATA2.TXT"'
output = [s for s in line.split('"') if s.strip() != '']
print(output)
>>> ['Y:\\DATA\\00001\\SERVER\\DATA.TXT', 'V:\\DATA2\\00002\\SERVER2\\DATA2.TXT']

Upvotes: 2

OldSteve
OldSteve

Reputation: 608

My question Python - Error Caused by Space in argv Arument was marked as a duplicate of this one. We have a number of Python books doing back to Python 2.3. The oldest referred to using a list for argv, but with no example, so I changed things to:-

repoCmd = ['Purchaser.py', 'task', repoTask, LastDataPath]
SWCore.main(repoCmd)

and in SWCore to:-

sys.argv = args

The shlex module worked but I prefer this.

Upvotes: -1

D'Arcy
D'Arcy

Reputation: 423

I know this got answered a million year ago, but this works too:

input = '"Y:\DATA\00001\SERVER\DATA.TXT" "V:\DATA2\00002\SERVER2\DATA2.TXT"'
input = input.replace('" "','"').split('"')[1:-1]

Should output it as a list containing:

['Y:\\DATA\x0001\\SERVER\\DATA.TXT', 'V:\\DATA2\x0002\\SERVER2\\DATA2.TXT']

Upvotes: 0

Kashif Siddiqui
Kashif Siddiqui

Reputation: 1546

shlex module can help you.

import shlex

my_string = '"Y:\DATA\00001\SERVER\DATA.TXT" "V:\DATA2\00002\SERVER2\DATA2.TXT"'
shlex.split(my_string)

This will spit

['Y:\\DATA\x0001\\SERVER\\DATA.TXT', 'V:\\DATA2\x0002\\SERVER2\\DATA2.TXT']

Reference: https://docs.python.org/2/library/shlex.html

Upvotes: 4

mckelvin
mckelvin

Reputation: 4058

No regex, no split, just use csv.reader

import csv

sample_line = '10.0.0.1 foo "24/Sep/2015:01:08:16 +0800" www.google.com "GET /" -'

def main():
    for l in csv.reader([sample_line], delimiter=' ', quotechar='"'):
        print l

The output is

['10.0.0.1', 'foo', '24/Sep/2015:01:08:16 +0800', 'www.google.com', 'GET /', '-']

Upvotes: 7

Jon Clements
Jon Clements

Reputation: 142136

I'll just add that if you were dealing with lines that look like they could be command line parameters, then you could possibly take advantage of the shlex module:

import shlex

with open('somefile') as fin:
    for line in fin:
        print shlex.split(line)

Would give:

['Y:\\DATA\\00001\\SERVER\\DATA.TXT', 'V:\\DATA2\\00002\\SERVER2\\DATA2.TXT']

Upvotes: 12

Redsplinter
Redsplinter

Reputation: 163

This was my solution. It parses most sane input exactly the same as if it was passed into the command line directly.

import re
def simpleParse(input_):
    def reduce_(quotes):
        return '' if quotes.group(0) == '"' else '"'
    rex = r'("[^"]*"(?:\s|$)|[^\s]+)'

    return [re.sub(r'"{1,2}',reduce_,z.strip()) for z in re.findall(rex,input_)]

Use case: Collecting a bunch of single shot scripts into a utility launcher without having to redo command input much.

Edit: Got OCD about the stupid way that the command line handles crappy quoting and wrote the below:

import re
tokens = list()
reading = False
qc = 0
lq = 0
begin = 0
for z in range(len(trial)):
    char = trial[z]
    if re.match(r'[^\s]', char):
        if not reading:
            reading = True
            begin = z
            if re.match(r'"', char):
                begin = z
                qc = 1
            else:
                begin = z - 1
                qc = 0
            lc = begin
        else:
            if re.match(r'"', char):
                qc = qc + 1
                lq = z
    elif reading and qc % 2 == 0:
        reading = False
        if lq == z - 1:
            tokens.append(trial[begin + 1: z - 1])
        else: 
            tokens.append(trial[begin + 1: z])
if reading:
    tokens.append(trial[begin + 1: len(trial) ])
tokens = [re.sub(r'"{1,2}',lambda y:'' if y.group(0) == '"' else '"', z) for z in tokens]

Upvotes: 0

Thomas Jung
Thomas Jung

Reputation: 33092

Finding all regular expression matches will do it:

input=r'"Y:\DATA\00001\SERVER\DATA.TXT" "V:\DATA2\00002\SERVER2\DATA2.TXT"'

re.findall('".+?"', # or '"[^"]+"', input)

This will return the list of file names:

["Y:\DATA\00001\SERVER\DATA.TXT", "V:\DATA2\00002\SERVER2\DATA2.TXT"]

To get the file name without quotes use:

[f[1:-1] for f in re.findall('".+?"', input)]

or use re.finditer:

[f.group(1) for f in re.finditer('"(.+?)"', input)]

Upvotes: 3

HennyH
HennyH

Reputation: 7944

I think what you want is to extract the filepaths, which are separated by spaces. That is you want to split the line about items contained within quotations. I.e with a line

"FILE PATH" "FILE PATH 2"

You want

["FILE PATH","FILE PATH 2"]

In which case:

import re
with open('file.txt') as f:
    for line in f:
        print(re.split(r'(?<=")\s(?=")',line))

With file.txt:

"Y:\DATA\00001\SERVER\DATA MINER.TXT" "V:\DATA2\00002\SERVER2\DATA2.TXT"

Outputs:

>>> 
['"Y:\\DATA\\00001\\SERVER\\DATA MINER.TXT"', '"V:\\DATA2\\00002\\SERVER2\\DATA2.TXT"']

Upvotes: 0

user1907906
user1907906

Reputation:

You must escape the ":

input.split("\"")

results in

['\n',
 'Y:\\DATA\x0001\\SERVER\\DATA.TXT',
 ' ',
 'V:\\DATA2\x0002\\SERVER2\\DATA2.TXT',
 '\n']

To drop the resulting empty lines:

[line for line in [line.strip() for line in input.split("\"")] if line]

results in

['Y:\\DATA\x0001\\SERVER\\DATA.TXT', 'V:\\DATA2\x0002\\SERVER2\\DATA2.TXT']

Upvotes: 14

Related Questions