Reputation: 680
Given some file that has,
foo/bar
foo/bar/gaz
foo/bar/urk
hello/world
hello/world/congress
hello/world/united/states
hello/world
How can I remove lines which have previous lines as substrings?
For example, foo/bar/gaz
has foo/bar
- a previous line - as substring, and should be removed.
The above list should be reduced to,
foo/bar
hello/world
(This is kind of like common denominator for lines in a file)
Upvotes: 1
Views: 90
Reputation: 58578
This might work for you (GNU sed):
sed -E 'G;/^([^\n]+).*\n\1(\n.*)*$/d;h;P;d' file
Stuff unique lines in the hold space and delete lines that partialy/fully match those lines.
Upvotes: 2
Reputation: 104092
Here is an awk
that may be faster if your file is larger:
awk 'BEGIN { FS=OFS="/" }
$0 in arr { next }
{ s=$1
for (i=2; i<=NF; i++) {
if (s in arr || (s OFS $i) in arr) next
s=s OFS $i}
arr[$0]} 1' file
Instead of looping over the entire array contents for each line of input, this loops over the substrings of each line and tests that for presence in the array of previous substrings.
Upvotes: 1
Reputation: 20032
When you have a line foo/bar
, you want to delete everything with foo/bar.
.
Just add a dot to every line and use that for the exclusion list.
grep -vf <(sed 's/$/./' file) file
Upvotes: 1
Reputation: 113994
Try:
$ awk '{for (s in a) if (s == substr($0,1,length(s))) next; print; a[$0]}' file
foo/bar
hello/world
The previous lines, excluding the those that are substrings of other lines, are the keys of array a
. for (s in a) if (s == substr($0,1,length(s))) next
checks to see if the current line, $0
, is a substring of a previous line. If so, we skip this line and jump to the next
line.
If the current line is not a substring of a previous line, then we print
it and add it as a key of a
.
$ cat file2
/etc
/foo/bar/etc
$ awk '{for (s in a) if (s == substr($0,1,length(s))) next; print; a[$0]}' file2
/etc
/foo/bar/etc
The code in this answer treats the "common denominator" as starting from the beginning of the string. Thus /etc
is not a "common denominator" for /foo/bar/etc
even though both have the common substring /etc
.
Upvotes: 2
Reputation: 5975
You can use awk
.
awk '{for (i in a) if ($0 ~ i) next} {a[$0]}1' file
Output:
foo/bar
hello/world
Upvotes: 3