user912475
user912475

Reputation:

Delete all lines in a text file that do not contain a string

So I've got a txt file where each line is a file path, I would like to:

  1. Read this txt file (line-by-line).
  2. Delete all lines that do not end with ,-,.txt
  3. In the remaining lines, delete everything from after the last / to the ,-,.txt.
  4. Write the output to a new txt.

How could this be done with sed?

Input:

/a/b1/
/a/b1/car
/a/b1/car/bil/
/a/b1/car/bil/,-,.txt
/a/b2/
/a/b2/flower
/a/b2/flower/bil/
/a/b2/flower/bil/,-,.txt
/a/b2/
/a/b2/boat
/a/b2/boat/baat/
/a/b2/boat/baat/abc,-,.txt

Second step:

/a/b1/car/bil/,-,.txt
/a/b2/flower/bil/,-,.txt
/a/b2/boat/baat/abc,-,.txt

Third step/desired output:

/a/b1/car/bil/
/a/b2/flower/bil/
/a/b2/boat/baat/

Upvotes: 2

Views: 4018

Answers (7)

potong
potong

Reputation: 58371

This might work for you:

sed 's/[^/]*,-,\.txt$//p;d' file

Upvotes: 0

kev
kev

Reputation: 161614

$ grep -oP '.*/(?=[^/]*,-,\.txt$)' input.txt
/a/b1/car/bil/
/a/b2/flower/bil/
/a/b2/boat/baat/

Upvotes: 1

Kent
Kent

Reputation: 195029

in your question you showed us two steps, is it acceptable if it is done in one short by a sed onliner?

sed -r  '/,-,\.txt/!d; s#/[^/]*$#/#' yourFile

works with your example data.

see the test below:

kent$  cat t.txt
/a/b1/
/a/b1/car
/a/b1/car/bil/
/a/b1/car/bil/,-,.txt
/a/b2/
/a/b2/flower
/a/b2/flower/bil/
/a/b2/flower/bil/,-,.txt
/a/b2/
/a/b2/boat
/a/b2/boat/baat/
/a/b2/boat/baat/abc,-,.txt

kent$  sed -r  '/,-,\.txt/!d; s#/[^/]*$#/#' t.txt
/a/b1/car/bil/
/a/b2/flower/bil/
/a/b2/boat/baat/

Upvotes: 0

Tshirtman
Tshirtman

Reputation: 5949

Does it need to be sed? i would use python for such thing, sed is quickly over complicated.

#!/usr/bin/env python
import sys

def main(fin, fout):
    with open(fin) as f:
        lines = []
        for line in f.readlines():
            if line.endswith(',-,.txt\n'):
                lines.append('/'.join(line.split('/')[:-1]) + '/\n')

    with open(fout, 'w') as f:
        for line in lines:
            f.write(line)

def usage():
    print sys.argv[0], "filename new_file"
    print 'remove all lines not ending with ",-,.txt"'
    print 'print the resulting lines, up to their last "/" to new file'


if __name__ == '__main__':
    if len(sys.argv) == 3:
        main(sys.argv[1], sys.argv[2])
    else:
        usage()

tested with sample

/a/b1/car/bil/
/a/b2/flower/bil/
/a/b2/boat/baat/

Upvotes: 1

Keith Thompson
Keith Thompson

Reputation: 263197

sed -n '/,-,\.txt$/s|/[^/]*$||p' input.txt > output.txt

What it does:

It reads a line at a time from input.txt; -n tells it not to print lines by default. For each line that matches the pattern ,-,\.txt$, everything consisting of a / character followed by zero or more non-/ characters, up to the end of the line, is deleted (i.e., from the last / to the end of the line); I use | as the delimiter so I don't have to escape the /.

This is a fairly straightforward rendition of your requirements.

Now that you've posted sample input and output, I see that you want to keep the final / (which is inconsistent with your requirement "delete everything from the last / to the ,-,.txt"). To do that:

sed -n '/,-,\.txt$/s|/[^/]*$|/|p' input.txt > output.txt

This produces your expected results given your sample input.

If I were doing this on the fly, I might use a simpler approach, combining sed and grep:

grep ',-,\.txt$' input.txt | sed 's|/[^/]*$|/|' > output.txt

Upvotes: 3

jcollado
jcollado

Reputation: 40374

This should do the job:

sed -r '/,-,\.txt$/!d' <file> | awk -F, '{print $1}'

Notes:

  • The sed command removes the lines that don't match the pattern (!d)
  • The awk command prints just the first field in a line with multiple fields separated by commas. This seems to be what you're looking for according to the input and desired output provided in the question.

Upvotes: 0

user unknown
user unknown

Reputation: 36229

echo -e "foo,-,.txt\nbar,-,.png" | sed -rn '/,-,\.txt/{s/^(.*),-,\.txt$/\1/p}'

explanation:

sed -rn : 
    -r  : use regular expressions, which allows (.*) as  
          capturing group without masking the parens. 
    -n  : no output by default 
    '/pattern/{ list of commands}' 
    {s/pattern/replacement/p} substitute pattern with replacement,  
       then print. 
    /^(.*)foo$/ : from line begin ^ to line end $, with anything 
       before foo being captured, to be outputted with \1

Upvotes: 1

Related Questions