Reputation: 728
sed 's/\t/_tab_/3g'
I have a sed command that basically replaces all excess tab delimiters in my text document. My documents are supposed to be 3 columns, but occasionally there's an extra delimiter. I don't have control over the files.
I use the above command to clean up the document. However all my other operations on these files are in python. Is there a way to do the above sed command in python?
sample input:
Column1 Column2 Column3
James 1,203.33 comment1
Mike -3,434.09 testing testing 123
Sarah 1,343,342.23 there here
sample output:
Column1 Column2 Column3
James 1,203.33 comment1
Mike -3,434.09 testing_tab_testing_tab_123
Sarah 1,343,342.23 there_tab_here
Upvotes: 3
Views: 197
Reputation: 626747
You may read the file line by line, split with tab, and if there are more than 3 items, join the items after the 3rd one with _tab_
:
lines = []
with open('inputfile.txt', 'r') as fr:
for line in fr:
split = line.split('\t')
if len(split) > 3:
tmp = split[:2] # Slice the first two items
tmp.append("_tab_".join(split[2:])) # Append the rest joined with _tab_
lines.append("\t".join(tmp)) # Use the updated line
else:
lines.append(line) # Else, put the line as is
See the Python demo
The lines
variable will contain something like
Mike -3,434.09 testing_tab_testing_tab_123
Mike -3,434.09 testing_tab_256
No operation here
Upvotes: 1
Reputation: 7631
You can mimic the sed
behavior in python:
import re
pattern = re.compile(r'\t')
string = 'Mike\t3,434.09\ttesting\ttesting\t123'
replacement = '_tab_'
count = -1
spans = []
start = 2 # Starting index of matches to replace (0 based)
for match in re.finditer(pattern, string):
count += 1
if count >= start:
spans.append(match.span())
spans.reverse()
new_str = string
for sp in spans:
new_str = new_str[0:sp[0]] + replacement + new_str[sp[1]:]
And now new_str
is 'Mike\t3,434.09\ttesting_tab_testing_tab_123'
.
You can wrap it in a function and repeat for every line. However, note that this GNU sed behavior isn't standard:
'NUMBER' Only replace the NUMBERth match of the REGEXP.
interaction in 's' command Note: the POSIX standard does not specify what should happen when you mix the 'g' and NUMBER modifiers, and currently there is no widely agreed upon meaning across 'sed' implementations. For GNU 'sed', the interaction is defined to be: ignore matches before the NUMBERth, and then match and replace all matches from the NUMBERth on.
Upvotes: 0
Reputation: 2193
import os
os.system("sed -i 's/\t/_tab_/3g' " + file_path)
Does this work? Please notice that there is a -i argument for the above sed command, which is used to modify the input file inplace.
Upvotes: 0