Reputation: 1139
I'm just wondering, I'm trying to make a very simple text processing or reduction. I want to replace all spaces (without these in " "
) by one. I also have some semantic action dependent on each character read, so I that's why I don't want to use any regex. It's some kind of pseudo FSM model.
So here's the the deal:
s = '''that's my string, " keep these spaces " but reduce these '''
Desired ouput:
that's my string, " keep these spaces " but reduce these
What I would like to do is something like this: (I don't mention the '"'
case to keep the example simple)
out = ""
for i in range(len(s)):
if s[i].isspace():
out += ' '
while s[i].isspace():
i += 1
else:
out += s[i]
I don't quite understand how the scopes are created or shared in this case.
Thank you for advice.
Upvotes: 2
Views: 3360
Reputation: 1816
A bit concerned whether this solution will be readable or not. Modified the string OP suggested to include multiple double quote pairs in the given string.
s = '''that's my string, " keep these spaces "" as well as these " reduce these" keep these spaces too " but not these '''
s_split = s.split('"')
# The substrings in odd positions of list s_split should retain their spaces.
# These elements have however lost their double quotes during .split('"'),
# so add them for new string. For the substrings in even postions, remove
# the multiple spaces in between by splitting them again using .split()
# and joining them with a single space. However this will not conserve
# leading and trailing spaces. In order conserve them, add a dummy
# character (in this case '-') at the start and end of the substring before
# the split. Remove the dummy bits after the split.
#
# Finally join the elements in new_string_list to create the desired string.
new_string_list = ['"' + x + '"' if i%2 == 1
else ' '.join(('-' + x + '-').split())[1:-1]
for i,x in enumerate(s_split)]
new_string = ''.join(new_string_list)
print(new_string)
Output is
>>>that's my string, " keep these spaces "" as well as these " reduce these" keep these spaces too " but not these
Upvotes: 0
Reputation: 168616
I also have some semantic action dependent on each character read ... It's some kind of pseudo FSM model.
You could actually implement an FSM:
s = '''that's my string, " keep these spaces " but reduce these '''
normal, quoted, eating = 0,1,2
state = eating
result = ''
for ch in s:
if (state, ch) == (eating, ' '):
continue
elif (state,ch) == (eating, '"'):
result += ch
state = quoted
elif state == eating:
result += ch
state = normal
elif (state, ch) == (quoted, '"'):
result += ch
state = normal
elif state == quoted:
result += ch
elif (state,ch) == (normal, '"'):
result += ch
state = quoted
elif (state,ch) == (normal, ' '):
result += ch
state = eating
else: # state == normal
result += ch
print result
Or, the data-driven version:
actions = {
'normal' : {
' ' : lambda x: ('eating', ' '),
'"' : lambda x: ('quoted', '"'),
None: lambda x: ('normal', x)
},
'eating' : {
' ' : lambda x: ('eating', ''),
'"' : lambda x: ('quoted', '"'),
None: lambda x: ('normal', x)
},
'quoted' : {
'"' : lambda x: ('normal', '"'),
'\\': lambda x: ('escaped', '\\'),
None: lambda x: ('quoted', x)
},
'escaped' : {
None: lambda x: ('quoted', x)
}
}
def reduce(s):
result = ''
state = 'eating'
for ch in s:
state, ch = actions[state].get(ch, actions[state][None])(ch)
result += ch
return result
s = '''that's my string, " keep these spaces " but reduce these '''
print reduce(s)
Upvotes: 1
Reputation: 34272
As already suggested, I'd use the standard shlex module instead, with some adjustments:
import shlex
def reduce_spaces(s):
lex = shlex.shlex(s)
lex.quotes = '"' # ignore single quotes
lex.whitespace_split = True # use only spaces to separate tokens
tokens = iter(lex.get_token, lex.eof) # exhaust the lexer
return ' '.join(tokens)
>>> s = '''that's my string, " keep these spaces " but reduce these '''
>>> reduce_spaces(s)
'that\'s my string, " keep these spaces " but reduce these'
Upvotes: 1
Reputation:
It is a bit of a hack but you could do reducing to a single space with a one-liner.
one_space = lambda s : ' '.join([part for part in s.split(' ') if part]
This joins the parts that are not empty, that is they have not space characters, together separated by a single space. The harder part of course is separating out the exceptional part in double quotes. In real production code you would want to be careful of cases like escaped double quotes as well. But presuming that you have only well mannered case you could separate those out as well. I presume in real code you may have more than one double quoted section.
You can do this making a list from your string separated by double quote and using only once one the even indexed items and directly appending the even indexed items I believe from working some examples.
def fix_spaces(s):
dbl_parts = s.split('"')
normalize = lambda i: one_space(' ', dbl_parts[i]) if not i%2 else dbl_parts[i]
return ' '.join([normalize(i) for i in range(len(dbl_parts))])
Upvotes: 0
Reputation: 3194
Use shlex to parse your string to quoted and unquoted parts, then in unquoted parts use regex to replace sequence of whitespace with one space.
Upvotes: 1
Reputation: 113915
i = iter((i for i,char in enumerate(s) if char=='"'))
zones = list(zip(*[i]*2)) # a list of all the "zones" where spaces should not be manipulated
answer = []
space = False
for i,char in enumerate(s):
if not any(zone[0] <= i <= zone[1] for zone in zones):
if char.isspace():
if not space:
answer.append(char)
else:
answer.append(char)
else:
answer.append(char)
space = char.isspace()
print(''.join(answer))
And the output:
>>> s = '''that's my string, " keep these spaces " but reduce these '''
>>> i = iter((i for i,char in enumerate(s) if char=='"'))
>>> zones = list(zip(*[i]*2))
>>> answer = []
>>> space = False
>>> for i,char in enumerate(s):
... if not any(zone[0] <= i <= zone[1] for zone in zones):
... if char.isspace():
... if not space:
... answer.append(char)
... else:
... answer.append(char)
... else:
... answer.append(char)
... space = char.isspace()
...
>>> print(''.join(answer))
that's my string, " keep these spaces " but reduce these
Upvotes: 0