Reputation: 23
I am trying to replace whitespace characters with '\t' string. The text file looks like this:
255 255 255 white
0 0 0 black
47 79 79 dark slate gray
47 79 79 DarkSlateGray
47 79 79 DarkSlateGrey
105 105 105 dim gray
My code looks like:
import re
with open('rgb.txt', 'r') as f:
for line in f:
print(re.sub(r'\s+', r'\\t', line))
The above code gives:
255\t255\t255\twhite
\t0\t0\t0\tblack
\t47\t79\t79\tdark\tslate\tgray
\t47\t79\t79\tDarkSlateGray
\t47\t79\t79\tDarkSlateGrey
105\t105\t105\tdim\tgray
However, I only want to replace the whitespaces which are after the first number until the color name. Also not in between the color. The output I want is:
255\t255\t255\twhite
0\t0\t0\tblack
47\t79\t79\tdarkslategray
47\t79\t79\tDarkSlateGray
47\t79\t79\tDarkSlateGrey
105\t105\t105\tdimgray
Upvotes: 1
Views: 358
Reputation: 91508
You can do it in two passes:
import re
txt = """
255 255 255 white
0 0 0 black
47 79 79 dark slate gray
47 79 79 DarkSlateGray
47 79 79 DarkSlateGrey
105 105 105 dim gray
"""
for line in txt.split('\n'):
line = re.sub(r'^\s+', '', line) # remove leading spaces
print(regex.sub(r'(?<![a-zA-Z])(\s+)', r'\\t', line)) # change other spaces by \t when not preceded by a letter
Output:
255\t255\t255\twhite
0\t0\t0\tblack
47\t79\t79\tdark slate gray
47\t79\t79\tDarkSlateGray
47\t79\t79\tDarkSlateGrey
105\t105\t105\tdim gray
Upvotes: 0
Reputation: 3368
I'm not familiar with python to quickly answer accurately in python, but here's javascript showing the regex implementation. If the first three parameters will always be strings of digits, you can use handle it this way.
var input = `255 255 255 white
0 0 0 black
47 79 79 dark slate gray
47 79 79 DarkSlateGray
47 79 79 DarkSlateGrey
105 105 105 dim gray`
var output = input.replace(/(\d+)\s+/g, '$1\\t')
console.log(output)
Upvotes: 0
Reputation: 18631
I suggest using nested re.sub
s:
re.sub(r'^[\d\s]+', lambda x: re.sub(r'\s+', '\t', x.group()), line)
To get rid of spaces at start use line.lstrip()
before running the regex:
re.sub(r'^[\d\s]+', lambda x: re.sub(r'\s+', '\t', x.group()), line.lstrip())
The first ^[\d\s]+
matches all digits and spaces at the start of line and the second re.sub
replaces whitespace strings with a single tab.
Output (for lines without .lstrip()
):
255\t255\t255\twhite
\t0\t0\t0\tblack
\t47\t79\t79\tdark slate gray
\t47\t79\t79\tDarkSlateGray
\t47\t79\t79\tDarkSlateGrey
105\t105\t105\tdim gray
Output (for lines with .lstrip()
):
255\t255\t255\twhite
0\t0\t0\tblack
47\t79\t79\tdark slate gray
47\t79\t79\tDarkSlateGray
47\t79\t79\tDarkSlateGrey
105\t105\t105\tdim gray
Upvotes: 1
Reputation: 24691
You can match whitespace immediately following a digit, which should solve the problem:
>>> txt = """255 255 255 white
... 0 0 0 black
... 47 79 79 dark slate gray
... 47 79 79 DarkSlateGray
... 47 79 79 DarkSlateGrey
... 105 105 105 dim gray"""
>>> for line in txt.split('\n'):
... print(re.sub(r'[0-9]\s+', lambda m:m.group(0)[0]+r'\t', line))
...
255\t255\t255\twhite
0\t0\t0\tblack
47\t79\t79\tdark slate gray
47\t79\t79\tDarkSlateGray
47\t79\t79\tDarkSlateGrey
105\t105\t105\tdim gray
I couldn't find a quick way to just ignore the digit in the replacement, so I just made a lambda instead that takes the digit that was matched and appends a \t
to it.
Upvotes: 2