Flethuseo
Flethuseo

Reputation: 6189

Scanning file once and cutting different things from it?

I want to extract information from two columns of textfile. Currently I have my code extracting this information with 3 different scans:

cut -d',' -f 8 file1.csv | sort -g | uniq -c | wc -l
cut -d',' -f 9 file1.csv | sort -g | uniq -c | wc -l
cut -d',' -f8,9 file1.csv | sort -g | uniq -c | wc -l

I would like to do it all scanning the file only once. I also forgot to add that I would like to get the 3 different line counts, not all combined into one. Is this possible to do somehow without writing a complex program?

Any help appreciated,

Ted.

Upvotes: 2

Views: 88

Answers (2)

Jonathan Leffler
Jonathan Leffler

Reputation: 754160

I'd use awk or perl (and Python or Ruby could be used instead) to post-process the last variant of cut:

cut -d',' -f8,9 file1.csv |
awk -F, '{ field8[$1] = 1; field9[$2] = 1; field89[$1,$2] = 1; }
         END {
             i=0; for (j in field8)  { i++; }; print i;
             i=0; for (j in field9)  { i++; }; print i;
             i=0; for (j in field89) { i++; }; print i;
             }'

Or, simplifying since awk can split fields:

awk -F, '{ field8[$8] = 1; field9[$9] = 1; field89[$8,$9] = 1; }
         END {
             i=0; for (j in field8)  { i++; }; print i;
             i=0; for (j in field9)  { i++; }; print i;
             i=0; for (j in field89) { i++; }; print i;
             }' file1.csv

Since the question assumes there are no complications with commas embedded in data fields, etc, this answer ignores the issues too. Be aware, though, that CSV files in general can be too complex to process using simple tools like cut (and even awk). Perl has modules to handle CSV properly; so do other extensible scripting languages.

Upvotes: 3

glenn jackman
glenn jackman

Reputation: 246942

awk -F, '
    { a8[$8]; a9[$9]; a89[$8 FS $9] }
    END {
        c=0; for (e in a8)  c++; print "col 8: "   c
        c=0; for (e in a9)  c++; print "col 9: "   c
        c=0; for (e in a89) c++; print "col 8,9: " c
    }
'

Upvotes: 4

Related Questions