dakov
dakov

Reputation: 1139

Python loop through string in nested for loops

I'm just wondering, I'm trying to make a very simple text processing or reduction. I want to replace all spaces (without these in " ") by one. I also have some semantic action dependent on each character read, so I that's why I don't want to use any regex. It's some kind of pseudo FSM model.

So here's the the deal:

s = '''that's my     string, "   keep these spaces     "    but reduce these '''

Desired ouput:

that's my string, "   keep these spaces    " but reduce these

What I would like to do is something like this: (I don't mention the '"' case to keep the example simple)

out = ""
for i in range(len(s)):

  if s[i].isspace():
    out += ' '
    while s[i].isspace():
      i += 1

  else:
    out += s[i]

I don't quite understand how the scopes are created or shared in this case.

Thank you for advice.

Upvotes: 2

Views: 3360

Answers (6)

Pulimon
Pulimon

Reputation: 1816

A bit concerned whether this solution will be readable or not. Modified the string OP suggested to include multiple double quote pairs in the given string.

s = '''that's my     string,   "   keep these spaces     "" as    well    as these    "    reduce these"   keep these spaces too   "   but not these  '''
s_split = s.split('"')

# The substrings in odd positions of list s_split should retain their spaces.
# These elements have however lost their double quotes during .split('"'),
# so add them for new string. For the substrings in even postions, remove 
# the multiple spaces in between by splitting them again using .split() 
# and joining them with a single space. However this will not conserve 
# leading and trailing spaces. In order conserve them, add a dummy 
# character (in this case '-') at the start and end of the substring before 
# the split. Remove the dummy bits after the split.
#
# Finally join the elements in new_string_list to create the desired string.

new_string_list = ['"' + x + '"' if i%2 == 1
                   else ' '.join(('-' + x + '-').split())[1:-1]                   
                   for i,x in enumerate(s_split)]
new_string = ''.join(new_string_list)
print(new_string)

Output is

>>>that's my string, "   keep these spaces     "" as    well    as these    " reduce these"   keep these spaces too   " but not these 

Upvotes: 0

Robᵩ
Robᵩ

Reputation: 168616

I also have some semantic action dependent on each character read ... It's some kind of pseudo FSM model.

You could actually implement an FSM:

s = '''that's my     string, "   keep these spaces     "    but reduce these '''


normal, quoted, eating = 0,1,2
state = eating
result = ''
for ch in s:
  if (state, ch) == (eating, ' '):
    continue
  elif (state,ch) == (eating, '"'):
    result += ch
    state = quoted
  elif state == eating:
    result += ch
    state = normal
  elif (state, ch) == (quoted, '"'):
    result += ch
    state = normal
  elif state == quoted:
    result += ch
  elif (state,ch) == (normal, '"'):
    result += ch
    state = quoted
  elif (state,ch) == (normal, ' '):
    result += ch
    state = eating
  else: # state == normal
    result += ch

print result

Or, the data-driven version:

actions = {
    'normal' : {
        ' ' : lambda x: ('eating', ' '),
        '"' : lambda x: ('quoted', '"'),
        None: lambda x: ('normal', x)
    },
    'eating' : {
        ' ' : lambda x: ('eating', ''),
        '"' : lambda x: ('quoted', '"'),
        None: lambda x: ('normal', x)
    },
    'quoted' : {
        '"' : lambda x: ('normal', '"'),
        '\\': lambda x: ('escaped', '\\'),
        None: lambda x: ('quoted', x)
    },
    'escaped' : {
        None: lambda x: ('quoted', x)
    }
}

def reduce(s):
    result = ''
    state = 'eating'
    for ch in s:
        state, ch = actions[state].get(ch, actions[state][None])(ch)
        result += ch
    return result

s = '''that's my     string, "   keep these spaces     "    but reduce these '''
print reduce(s)

Upvotes: 1

bereal
bereal

Reputation: 34272

As already suggested, I'd use the standard shlex module instead, with some adjustments:

import shlex

def reduce_spaces(s):
    lex = shlex.shlex(s)
    lex.quotes = '"'             # ignore single quotes
    lex.whitespace_split = True  # use only spaces to separate tokens
    tokens = iter(lex.get_token, lex.eof)  # exhaust the lexer
    return ' '.join(tokens)

>>> s = '''that's my   string, "   keep these spaces     "   but reduce these '''
>>> reduce_spaces(s)
'that\'s my string, "   keep these spaces     " but reduce these'

Upvotes: 1

user1969453
user1969453

Reputation:

It is a bit of a hack but you could do reducing to a single space with a one-liner.

one_space = lambda s : ' '.join([part for part in s.split(' ') if part]

This joins the parts that are not empty, that is they have not space characters, together separated by a single space. The harder part of course is separating out the exceptional part in double quotes. In real production code you would want to be careful of cases like escaped double quotes as well. But presuming that you have only well mannered case you could separate those out as well. I presume in real code you may have more than one double quoted section.

You can do this making a list from your string separated by double quote and using only once one the even indexed items and directly appending the even indexed items I believe from working some examples.

def fix_spaces(s):
  dbl_parts = s.split('"')
  normalize = lambda i: one_space(' ', dbl_parts[i]) if not i%2 else dbl_parts[i]
  return ' '.join([normalize(i) for i in range(len(dbl_parts))])

Upvotes: 0

Filip Malczak
Filip Malczak

Reputation: 3194

Use shlex to parse your string to quoted and unquoted parts, then in unquoted parts use regex to replace sequence of whitespace with one space.

Upvotes: 1

inspectorG4dget
inspectorG4dget

Reputation: 113915

i = iter((i for i,char in enumerate(s) if char=='"'))
zones = list(zip(*[i]*2))  # a list of all the "zones" where spaces should not be manipulated
answer = []
space = False
for i,char in enumerate(s):
    if not any(zone[0] <= i <= zone[1] for zone in zones):
        if char.isspace():
            if not space:
                answer.append(char)
        else:
            answer.append(char)
    else:
        answer.append(char)
    space = char.isspace()

print(''.join(answer))

And the output:

>>> s = '''that's my     string, "   keep these spaces     "    but reduce these '''
>>> i = iter((i for i,char in enumerate(s) if char=='"'))
>>> zones = list(zip(*[i]*2))
>>> answer = []
>>> space = False
>>> for i,char in enumerate(s):
...     if not any(zone[0] <= i <= zone[1] for zone in zones):
...         if char.isspace():
...             if not space:
...                 answer.append(char)
...         else:
...             answer.append(char)
...     else:
...         answer.append(char)
...     space = char.isspace()
... 
>>> print(''.join(answer))
that's my string, "   keep these spaces     " but reduce these 

Upvotes: 0

Related Questions