Reputation: 33

Grep: find lines only matching unknown character once

I have a list with hexadecimal lines. For example:

0b 5a 3f 5a 7d d0 5d e6 2b c4 7e 7d c2 c0 e6 9a 
84 bd aa 74 f3 85 da 9d ac b6 e0 b6 62 0f b5 d5
c0 b0 f5 60 02 8b 1c a4 41 7c 53 f2 85 20 a0 d1
...

I'm trying to find all the lines with grep, where there is a character that occurs only once in the line.

For example: there is only one time a 'd' in the third line.

I tried this, but it's not working:

egrep '^.*([a-f0-9])[^\1]*$'

Upvotes: 3

Answers (3)

hek2mgl

Reputation: 157947

I don't know a way to do it with a regex. However you can use this stupid awk script:

awk -F '' '{for(i=1;i<=NF;i++){a[$i]++};for(i in a){if(a[i]==1){print;next}}}' input

The scripts counts the number of occurrences of every character in the line. At the end of the line it checks all totals and prints the line if at least one of those totals equals 1.

Upvotes: 1

user557597

Reputation:

This can be done with a regex, but it has to be verbose.
It kind of can't be generalized.

 # ^(?:[^a]*a[^a]*|[^b]*b[^b]*|[^c]*c[^c]*|[^d]*d[^d]*|[^e]*e[^e]*|[^f]*f[^f]*|[^0]*0[^0]*|[^1]*1[^1]*|[^2]*2[^2]*|[^3]*3[^3]*|[^4]*4[^4]*|[^5]*5[^5]*|[^6]*6[^6]*|[^7]*7[^7]*|[^8]*8[^8]*|[^9]*9[^9]*)$

 ^ 
 (?:
      [^a]* a [^a]* 
   |  [^b]* b [^b]* 
   |  [^c]* c [^c]* 
   |  [^d]* d [^d]* 
   |  [^e]* e [^e]* 
   |  [^f]* f [^f]* 

   |  [^0]* 0 [^0]* 
   |  [^1]* 1 [^1]* 
   |  [^2]* 2 [^2]* 
   |  [^3]* 3 [^3]* 
   |  [^4]* 4 [^4]* 
   |  [^5]* 5 [^5]* 
   |  [^6]* 6 [^6]* 
   |  [^7]* 7 [^7]* 
   |  [^8]* 8 [^8]* 
   |  [^9]* 9 [^9]* 
 )
 $

For discovery, if you put capture groups around the letters and numbers,
and use a brach reset:

 ^ 
 (?|
      [^a]* (a) [^a]* 
   |  [^b]* (b) [^b]* 
   |  [^c]* (c) [^c]* 
   |  [^d]* (d) [^d]* 
   |  [^e]* (e) [^e]* 
   |  [^f]* (f) [^f]* 

   |  [^0]* (0) [^0]* 
   |  [^1]* (1) [^1]* 
   |  [^2]* (2) [^2]* 
   |  [^3]* (3) [^3]* 
   |  [^4]* (4) [^4]* 
   |  [^5]* (5) [^5]* 
   |  [^6]* (6) [^6]* 
   |  [^7]* (7) [^7]* 
   |  [^8]* (8) [^8]* 
   |  [^9]* (9) [^9]* 
 )
 $

This is the output:

 **  Grp 0 -  ( pos 0 , len 50 ) 
0b 5a 3f 5a 7d d0 5d e6 2b c4 7e 7d c2 c0 e6 9a 

 **  Grp 1 -  ( pos 7 , len 1 ) 
f  

-----------------------

 **  Grp 0 -  ( pos 50 , len 51 ) 

84 bd aa 74 f3 85 da 9d ac b6 e0 b6 62 0f b5 d5

 **  Grp 1 -  ( pos 77 , len 1 ) 
c  

-----------------------

 **  Grp 0 -  ( pos 101 , len 51 ) 

c0 b0 f5 60 02 8b 1c a4 41 7c 53 f2 85 20 a0 d1

 **  Grp 1 -  ( pos 148 , len 1 ) 
d

Upvotes: 3

Dima Chubarov

Reputation: 17159

Here is a piece of code that uses a number of shell tools beyond grep. It reads the input line by line. Generates a frequency table. Upon finding an element with frequency 1 it outputs the unique character and the entire line.

cat input | while read line ; do 
     export line ; 
     echo $line | grep -o . | sort | uniq -c | \
         awk '/[ ]+1[ ]/ {print $2 ":" ENVIRON["line"] ; exit }' ; 
done

Note that if you are interested in digits only you could replace grep -o . with grep -o "[a-f]"

Upvotes: 0

Grep: find lines only matching unknown character once

Answers (3)

Related Questions