UAT-PyFo
UAT-PyFo

Reputation: 23

replacing whitespace characters with '\t' string

I am trying to replace whitespace characters with '\t' string. The text file looks like this:

255 255 255             white
  0   0   0             black
 47  79  79             dark slate gray
 47  79  79             DarkSlateGray
 47  79  79             DarkSlateGrey
105 105 105             dim gray

My code looks like:

import re
with open('rgb.txt', 'r') as f:
    for line in f:
        print(re.sub(r'\s+', r'\\t', line))

The above code gives:

 255\t255\t255\twhite
 \t0\t0\t0\tblack
 \t47\t79\t79\tdark\tslate\tgray
 \t47\t79\t79\tDarkSlateGray
 \t47\t79\t79\tDarkSlateGrey
 105\t105\t105\tdim\tgray

However, I only want to replace the whitespaces which are after the first number until the color name. Also not in between the color. The output I want is:

 255\t255\t255\twhite
 0\t0\t0\tblack
 47\t79\t79\tdarkslategray
 47\t79\t79\tDarkSlateGray
 47\t79\t79\tDarkSlateGrey
 105\t105\t105\tdimgray

Upvotes: 1

Views: 358

Answers (4)

Toto
Toto

Reputation: 91508

You can do it in two passes:

import re
txt = """
255 255 255             white
  0   0   0             black
 47  79  79             dark slate gray
 47  79  79             DarkSlateGray
 47  79  79             DarkSlateGrey
105 105 105             dim gray
"""
for line in txt.split('\n'):
    line = re.sub(r'^\s+', '', line)  # remove leading spaces 
    print(regex.sub(r'(?<![a-zA-Z])(\s+)', r'\\t', line)) # change other spaces by \t when not preceded by a letter

Output:

255\t255\t255\twhite
0\t0\t0\tblack
47\t79\t79\tdark slate gray
47\t79\t79\tDarkSlateGray
47\t79\t79\tDarkSlateGrey
105\t105\t105\tdim gray

Upvotes: 0

jmcgriz
jmcgriz

Reputation: 3368

I'm not familiar with python to quickly answer accurately in python, but here's javascript showing the regex implementation. If the first three parameters will always be strings of digits, you can use handle it this way.

var input = `255 255 255             white
  0   0   0             black
 47  79  79             dark slate gray
 47  79  79             DarkSlateGray
 47  79  79             DarkSlateGrey
105 105 105             dim gray`

var output = input.replace(/(\d+)\s+/g, '$1\\t')

console.log(output)

Upvotes: 0

Ryszard Czech
Ryszard Czech

Reputation: 18631

I suggest using nested re.subs:

re.sub(r'^[\d\s]+', lambda x: re.sub(r'\s+', '\t', x.group()), line)

To get rid of spaces at start use line.lstrip() before running the regex:

re.sub(r'^[\d\s]+', lambda x: re.sub(r'\s+', '\t', x.group()), line.lstrip())

The first ^[\d\s]+ matches all digits and spaces at the start of line and the second re.sub replaces whitespace strings with a single tab.

Output (for lines without .lstrip()):

255\t255\t255\twhite
\t0\t0\t0\tblack
\t47\t79\t79\tdark slate gray
\t47\t79\t79\tDarkSlateGray
\t47\t79\t79\tDarkSlateGrey
105\t105\t105\tdim gray

Output (for lines with .lstrip()):

255\t255\t255\twhite
0\t0\t0\tblack
47\t79\t79\tdark slate gray
47\t79\t79\tDarkSlateGray
47\t79\t79\tDarkSlateGrey
105\t105\t105\tdim gray

Upvotes: 1

Green Cloak Guy
Green Cloak Guy

Reputation: 24691

You can match whitespace immediately following a digit, which should solve the problem:

>>> txt = """255 255 255             white
...   0   0   0             black
...  47  79  79             dark slate gray
...  47  79  79             DarkSlateGray
...  47  79  79             DarkSlateGrey
... 105 105 105             dim gray"""
>>> for line in txt.split('\n'):
...     print(re.sub(r'[0-9]\s+', lambda m:m.group(0)[0]+r'\t', line))
...
255\t255\t255\twhite
  0\t0\t0\tblack
 47\t79\t79\tdark slate gray
 47\t79\t79\tDarkSlateGray
 47\t79\t79\tDarkSlateGrey
105\t105\t105\tdim gray

I couldn't find a quick way to just ignore the digit in the replacement, so I just made a lambda instead that takes the digit that was matched and appends a \t to it.

Upvotes: 2

Related Questions