Find and Edit Text File

Question

I'm looking to find if there is a way of automating this process. Basically I have 300,000 rows of data needed to download on a daily basis. There are a couple of rows that need to be edited before it can be uploaded to SQL.

Jordan || Michael | 23 | Bulls | Chicago

Bryant | Kobe ||| 8 || LA

What I want to accomplish is to just have 4 vertical bars per row. Normally, I would search for a keyword then edit it manually then save. These two are the only anomalies in my data.

Find "Jordan", then remove the excess 1 vertical bar "|" right after it.
I need to find "Kobe", then remove the two excess vertical bars "|" right after it.

Correct format is below -

Jordan | Michael | 23 | Bulls | Chicago

Bryant | Kobe | 8 || LA

Not sure if this can be done in vbscript or Python. Any help would be appreciated. Thanks!

John1024 · Accepted Answer

Python or vbscript could be used but they are overkill for something this simple. Try sed:

$ sed -E 's/(Jordan *)\|/\1/g; s/(Kobe *)\| *\|/\1/g' file 
Jordan | Michael | 23 | Bulls | Chicago
Bryant | Kobe | 8 || LA

To save to a new file:

sed -E 's/(Jordan *)\|/\1/g; s/(Kobe *)\| *\|/\1/g' file >newfile

Or, to change the existing file in-place:

sed -Ei.bak 's/(Jordan *)\|/\1/g; s/(Kobe *)\| *\|/\1/g' file

How it works

sed reads and processes a file line by line. In our case, we need only the substitute command which has the form s/old/new/g where old is a regular expression and, if it is found, it is replaced by new. The optional g at the end of the command tells sed to perform the substitution command 'globally', meaning not just once but as many times as it appears on the line.

s/(Jordan *)\|/\1/g

This tells sed to look for Jordan followed by zero or more spaces followed by a vertical bar and remove the vertical bar.

In more detail, the parens in (Jordan *) tell sed to save the string Jordan followed by zero or more spaces as a group. In the replacement side, we reference that group as \1.
s/(Kobe *)\| *\|/\1/g

Similarly, this tells sed to look for Kobe followed by zero or more spaces followed by a vertical bar and remove the vertical bar.

Using python

Using the same logic as above, here is a python program:

$ cat kobe.py
import re
with open('file') as f:
    for line in f:
        line = re.sub(r'(Jordan *)\|', r'\1', line)
        line = re.sub(r'(Kobe *)\| *\|', r'\1', line)
        print(line.rstrip('
'))
$ python kobe.py
Jordan | Michael | 23 | Bulls | Chicago
Bryant | Kobe | 8 || LA

To save that to a new file:

python kobe.py >newfile

Find and Edit Text File

Answers (2)

How it works

Using python

Related Questions