Reputation: 6189
I want to extract information from two columns of textfile. Currently I have my code extracting this information with 3 different scans:
cut -d',' -f 8 file1.csv | sort -g | uniq -c | wc -l
cut -d',' -f 9 file1.csv | sort -g | uniq -c | wc -l
cut -d',' -f8,9 file1.csv | sort -g | uniq -c | wc -l
I would like to do it all scanning the file only once. I also forgot to add that I would like to get the 3 different line counts, not all combined into one. Is this possible to do somehow without writing a complex program?
Any help appreciated,
Ted.
Upvotes: 2
Views: 88
Reputation: 754160
I'd use awk
or perl
(and Python or Ruby could be used instead) to post-process the last variant of cut
:
cut -d',' -f8,9 file1.csv |
awk -F, '{ field8[$1] = 1; field9[$2] = 1; field89[$1,$2] = 1; }
END {
i=0; for (j in field8) { i++; }; print i;
i=0; for (j in field9) { i++; }; print i;
i=0; for (j in field89) { i++; }; print i;
}'
Or, simplifying since awk
can split fields:
awk -F, '{ field8[$8] = 1; field9[$9] = 1; field89[$8,$9] = 1; }
END {
i=0; for (j in field8) { i++; }; print i;
i=0; for (j in field9) { i++; }; print i;
i=0; for (j in field89) { i++; }; print i;
}' file1.csv
Since the question assumes there are no complications with commas embedded in data fields, etc, this answer ignores the issues too. Be aware, though, that CSV files in general can be too complex to process using simple tools like cut
(and even awk
). Perl has modules to handle CSV properly; so do other extensible scripting languages.
Upvotes: 3
Reputation: 246942
awk -F, '
{ a8[$8]; a9[$9]; a89[$8 FS $9] }
END {
c=0; for (e in a8) c++; print "col 8: " c
c=0; for (e in a9) c++; print "col 9: " c
c=0; for (e in a89) c++; print "col 8,9: " c
}
'
Upvotes: 4