Reputation:

How to delete a part of the file with awk

I'm writing a shell script, which at some point has to take a file, search for a particular word in it and delete the whole text that comes after this word (including the word itself) - awk is the right tool I suppose, but I don't really know much about programming in it.

Could anyone help me?

Upvotes: 1

Answers (6)

Ste

Reputation: 9

To delete part of line with sed, eg:

$ echo '12345 John Smith / red black or blue it is a test' | sed -e 's/\/.*//'

$ 12345 John Smith

Upvotes: 0

ghostdog74

Reputation: 342313

awk '/word/{exit}1' file

Upvotes: 1

dhn

Reputation: 206

This awk one-liner should do the trick: { sub(/ word.*/, ""); print } For every line, if the line contains a pattern that starts with word (proceeded by space) and goes to the end of the line - replace the pattern with the empty string - then print the updated line.

[ Figured the question could read either way (whole text on that line or whole text in the file). If one wanted to skip the rest of the file one could: { skip = gsub(/ word.*/, ""); print ; if (skip) exit } ]

Upvotes: 0

Jonathan Leffler

Reputation: 753555

I suppose 'awk' is one tool for the job, though I think 'sed' is simpler for this particular operation. The specification is a bit vague. The simple version is:

Find the first line containing a given word.
Delete that line and all following lines.

For that, I'd use 'sed':

sed '/word/,$d' file

The more complex version is:

Find the first line containing a given word.
Delete the text on that line from the word onwards.
Delete all subsequent lines of text.

I'd probably still use 'sed':

sed -n '1,/word/{s/word.*//;p}' file

This inverts the logic. It doesn't print anything by default, but for lines 1 until the first line containing word it does a substitute (which does nothing until the line containing the word), and then print.

Can it be done in 'awk'? Not completely trivially because 'awk' autosplits input lines into words, and because you have to use functions to do substitutions.

awk '/word/ { if (found == 0) {
                # First line with word
                sub("word.*", "")
                print $0;
                found = 1
              }
            }
            { if (found == 0) print $0; }' file

(Edited: change 'delete' to 'found' since 'delete' is a reserved word in 'awk'.)

In all these examples, the truncated version of the input file is written to standard output. To modify the file in situ, you either need to use Perl or Python or a similar language, or you capture the output in a temporary file which you copy over the original once the command has completed. (If you try 'script file' you process an empty file.)

There are various early exit optimizations that could be applied to the sed and awk scripts, such as:

sed '/word/q' file

And, if you assume the use of the GNU versions of awk or sed, there are various non-standard extensions that can help with in-situ modification of the file.

Upvotes: 8

Stobor

Reputation: 45122

I'm assuming your input is something like this:

Lorem ipsum dolor sit amet,
consectetur adipiscing velit.
Nullam neque sapien, molestie vel congue non,
feugiat quis tellus. Ut quis
nulla mi. Maecenas a ligula.

and you want the output to be cut off at the word 'vel' like so:

Lorem ipsum dolor sit amet,
consectetur adipiscing velit.
Nullam neque sapien, molestie

In that case, your awk script would be:

cat lorem.txt | awk ' 
  /\<vel\>/ 
  {
     print substr($0, 0, match($0, /\<vel\>/) - 1); 
     exit; 
  } 

  { print }
'

The word you want to cut off at needs to replace both instances of the word vel in the script.

You can safely put the entire script on one line, too.

Upvotes: 1

Adam Rosenfield

Reputation: 400174

I'm not sure how to do it with awk, but you could do it with sed:

sed -i~ -e 's/the-word-to-find.*$//' the-file

This will delete everything from the-word-to-find to the end of the line, on every line that contains the-word-to-find. If you want to delete the rest of the file upon the first occurrence of the-word-to-find, you could do:

sed -i~ -e 's/\(the-word-to-find\).*$/\1/;/the-word-to-find/,$d'

Upvotes: 0

How to delete a part of the file with awk

Answers (6)

Related Questions