user1614796
user1614796

Reputation: 99

Breaking up substrings in Python based on characters

I am trying to write code that will take a string and remove specific data from it. I know that the data will look like the line below, and I only need the data within the " " marks, not the marks themselves.

inputString = 'type="NN" span="123..145" confidence="1.0" '

Is there a way to take a Substring of a string within two characters to know the start and stop points?

Upvotes: 1

Views: 1102

Answers (3)

Pascal Bugnion
Pascal Bugnion

Reputation: 4928

You could split the string at each space to get a list of 'key="value"' substrings and then use regular expressions to parse the substrings.

Using your input string:

>>> input_string = 'type="NN" span="123..145" confidence="1.0" '
>>> input_string_split = input_string.split()
>>> print input_string_split
[ 'type="NN"', 'span="123..145"', 'confidence="1.0"' ]

Then use regular expressions:

>>> import re
>>> pattern = r'"([^"]+)"'
>>> for substring in input_string_split:
      match_obj = search(pattern, substring)
      print match_obj.group(1)
NN
123..145
1.0

The regular expression '"([^"]+)"' matches anything within quotation marks (provided there is at least one character). The round brackets indicate the bit of the regular expression that you are interested in.

Upvotes: 0

sureshvv
sureshvv

Reputation: 4422

fields = inputString.split('"')
print fields[1], fields[3], fields[5]

Upvotes: 0

hochl
hochl

Reputation: 12930

You can extract all the text between pairs of " characters using regular expressions:

import re
inputString='type="NN" span="123..145" confidence="1.0" '
pat=re.compile('"([^"]*)"')
while True:
        mat=pat.search(inputString)
        if mat is None:
                break
        strings.append(mat.group(1))
        inputString=inputString[mat.end():]
print strings

or, easier:

import re
inputString='type="NN" span="123..145" confidence="1.0" '
strings=re.findall('"([^"]*)"', inputString)
print strings

Output for both versions:

['NN', '123..145', '1.0']

Upvotes: 2

Related Questions