Reputation: 2458
I have 2 nix files. All of the data is on one single line in each file. Each value is separated by a null character. Some off the values in the data match.
How would I parse this data into a new file listing only the matching values ?
I figure I could use sed to change the null characters into newlines ? From there on I'm not real sure...
Any ideas ?
Upvotes: 5
Views: 6091
Reputation: 58478
This might work for you:
parallel 'tr "\000" "\n" <{} | sort -u' ::: file{1,2} | sort | uniq -d
Upvotes: 2
Reputation: 4416
If there are no duplicate values within file1 or file2, you can do this:
( tr '\0' '\n' < file1; tr '\0' '\n' < file2 ) | sort | uniq -c | egrep -v '^ +1'
This will count all of the duplicate values between the two files.
If the order of the fields is important, you can do this:
comm -1 -2 <(tr '\0' '\n' < file1) <(tr '\0' '\n' < file2)
This approach is not portable, it requires the 'process substitution' feature of Bash.
Upvotes: 4
Reputation: 16185
Use tr
, sort
and comm
:
Convert nulls into new lines, and sort the result:
$ tr '\000' '\n' < file1 | sort > file1.txt
$ tr '\000' '\n' < file2 | sort > file2.txt
then use comm
to get the lines that are common to both file:
$ comm -1 -2 file1.txt file2.txt
<lines shown here are the common lines between file1.txt and file2.txt>
Upvotes: 10