Reputation: 1394
I am quite new to shell scripting.
I am scraping a website and the scraped text contains a lot of repetitions. Usually they are the menus on a forum, for example. Mostly, I do this in Python, but I thought that sed
command will save me reading and printing the input, loops etc. I want to delete thousands of repeated lines from the same single file. I do not want to copy it to another file, because I will end up with 100 new files. The following is a shadow script which I run from the bash shell.
#!/bin/sed -f
sed -i '/^how$/d' input_file.txt
sed -i '/^is test$/d' input_file.txt
sed -i '/^repeated text/d' input_file.txt
This is the content of the input file:
how to do this task
why it is not working
this is test
Stackoverflow is a very helpful community of programmers
that is test
this is text
repeated text is common
this is repeated text of the above line
Then I run in the shell the following command:
sed -f scriptFile input_file.txt
I get the following error
sed: scriptFile line 2: untermindated `s' command
How can I correct the script, and what is the correct syntax of the command I should use to get it work?
Any help is highly appreciated.
Upvotes: 1
Views: 67
Reputation: 22225
Wouldn't it be easier to do it with egrep followed by a mv, for example
egrep -v 'pattern1|pattern2|pattern3|...' <input_file.txt >tmpfile.txt
mv tmpfile.txt input_file.txt
Each pattern would describe the lines being deleted, much like in sed. You would not end up with additional files, because the mv removes them.
If you have so many pattern, that you don't want to specify them directly on the command line, you can store them in a file use the -f option of egrep.
Upvotes: 0
Reputation: 6333
assuming you know what your script is doing, it's very easy to put them into a script. in your case, the script should be:
/^how$/d
/^is test$/d
/^repeated text/d
that's good enough.
to make the script alone to be executable is easy too:
#!/usr/bin/env sed -f
/^how$/d
/^is test$/d
/^repeated text/d
then
chmod +x your_sed_script
./your_sed_script <old >new
here is a very good and compact tutorial. you can learn a lot from it.
following is an example from the site, just in case the link is dead:
If you have a large number of sed commands, you can put them into a file and use
sed -f sedscript <old >new
where sedscript could look like this:
# sed comment - This script changes lower case vowels to upper case
s/a/A/g
s/e/E/g
s/i/I/g
s/o/O/g
s/u/U/g
Upvotes: 3