Mohammed
Mohammed

Reputation: 1394

Deleting all lines if pattern matches in sed linux mint 17

I am quite new to shell scripting.

I am scraping a website and the scraped text contains a lot of repetitions. Usually they are the menus on a forum, for example. Mostly, I do this in Python, but I thought that sed command will save me reading and printing the input, loops etc. I want to delete thousands of repeated lines from the same single file. I do not want to copy it to another file, because I will end up with 100 new files. The following is a shadow script which I run from the bash shell.

#!/bin/sed -f
sed -i '/^how$/d' input_file.txt
sed -i '/^is test$/d' input_file.txt
sed -i '/^repeated text/d' input_file.txt

This is the content of the input file:

how to do this task
why it is not working
this is test
Stackoverflow is a very helpful community of programmers
that is test
this is text
repeated text is common
this is repeated text of the above line

Then I run in the shell the following command:

sed -f scriptFile input_file.txt

I get the following error

sed: scriptFile line 2: untermindated `s' command

How can I correct the script, and what is the correct syntax of the command I should use to get it work?

Any help is highly appreciated.

Upvotes: 1

Views: 67

Answers (2)

user1934428
user1934428

Reputation: 22225

Wouldn't it be easier to do it with egrep followed by a mv, for example

egrep -v 'pattern1|pattern2|pattern3|...' <input_file.txt >tmpfile.txt
mv tmpfile.txt input_file.txt

Each pattern would describe the lines being deleted, much like in sed. You would not end up with additional files, because the mv removes them.

If you have so many pattern, that you don't want to specify them directly on the command line, you can store them in a file use the -f option of egrep.

Upvotes: 0

Jason Hu
Jason Hu

Reputation: 6333

assuming you know what your script is doing, it's very easy to put them into a script. in your case, the script should be:

/^how$/d
/^is test$/d
/^repeated text/d

that's good enough.

to make the script alone to be executable is easy too:

#!/usr/bin/env sed -f
/^how$/d
/^is test$/d
/^repeated text/d

then

chmod +x your_sed_script
./your_sed_script <old >new

here is a very good and compact tutorial. you can learn a lot from it.

following is an example from the site, just in case the link is dead:

If you have a large number of sed commands, you can put them into a file and use

sed -f sedscript <old >new

where sedscript could look like this:

# sed comment - This script changes lower case vowels to upper case
s/a/A/g
s/e/E/g
s/i/I/g
s/o/O/g
s/u/U/g

Upvotes: 3

Related Questions