htz
htz

Reputation: 1047

delete lines in file not matching the pattern

I am trying to migrate data which consists of a lot of separate text files. One step is to delete all lines in the text files, which are not used anymore. The lines are key-value-pairs. I want to delete everything in a file except those lines with certain keys. I do not know the order of the keys inside of the file.

The keys I want to keep are e.g. version, date and number.

I found this question Remove all lines except matching pattern line best practice (sed) and tried the accepted answer. My sed command is

sed '/^(version=.*$)|(date=.*$)|(number=.*$)/!d' file.txt

with a !d after the address to delete all lines NOT matching the pattern.

Example of the regex: https://regex101.com/r/LKfxpP/2

but it keeps deleting all lines in my file. Where is my mistake? I assume I am wrong with my regex, but whats the error here?

Upvotes: 2

Views: 1475

Answers (2)

oliv
oliv

Reputation: 13259

Using awk:

awk -F= '$1 !~ /version|date|number/' file.txt

The field separator is set to = and the first field must not match the given string.

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627488

You may use

sed '/^\(version\|date\|number\)=/!d' file.txt > newfile.txt

The BRE POSIX pattern here matches

  • ^ - start of a line
  • \(version\|date\|number\) - a group matching
    • version - a version string
    • \| - or
    • date - a date string
    • \| - or
    • number - a number string
  • = - a = char.

Or, use a POSIX ERE syntax enabled with -E option:

sed -E '/^(version|date|number)=/!d' file.txt > newfile.txt

Here, the alternation operator | and capturing parentheses do not need escaping.

See an online demo.

Upvotes: 1

Related Questions