Reputation: 1235
I have a following code that counts the number of characters in a file using awk.
but it doesn't count the line breaks as it is counted in $ wc file
file:abc
12345
12345
12345
12345
12345
awk command:
$ awk 'BEGIN{FS=""}{for(i=1;i<=NF;i++)c++}END{print "total chars:"c}' abc
This gives me o/p as
Total char:25
but if i run same abc file as wc abc
it gives me o/p as 30 characters
Any suggestions whether i can use two file separators at a time???
Upvotes: 1
Views: 1985
Reputation: 9936
Like I noted in this thread: Multiple Field separator in awk script awk
can only give a correct result for proper text files, where limits like maximum line lengths are observed and the last lines ends with a newline, whereas wc does not have this limitation..
awk '{t+=length} END{print "Total chars: " NR+t}' file
wc
does not care and will just count the characters..
=== edit === This might work:
awk '
NR==FNR{
m++
next
}
{
t+=length
}
m==FNR-1{
RS="§"
}
END{
print "Total chars: " FNR+t-1
}
' file file
or in one line:
awk 'NR==FNR{ m++; next } { t+=length } m==FNR-1{ RS="§" } END{ print "Total chars: " FNR+t-1 } ' file file
The file is read twice to determine the number of lines and then at the second pass the record separator gets changed..
Upvotes: 3
Reputation: 204099
This is based on @Scrutinizer's solution to show one way to handle files that might not end in a newline (using GNU awk for RT
) to address @konsolebox's concern:
gawk '{t+=length+(RT?1:0)} END{print t}' file
or, more efficiently, as @konsolebox pointed out:
gawk '{t+=length} END{print t+NR-(RT?0:1)}' file
To accommodate empty files:
gawk '{t+=length}END{print t+NR-(!RT&&NR?1:0)}'
Upvotes: 5
Reputation: 75558
Your records are still separated with RS so the 5 newlines are excluded from the count.
Use another delimiter for your FS and RS, and calculate the length of the whole $0 instead:
awk 'BEGIN{FS=RS="\x1c"}{c+=length($0)}END{print "total chars:"c}' abc
Output:
total chars:30
Note that using ""
or "\x00"
would make it skip the last character.
By concept it's actually the same as:
awk 'BEGIN{FS=RS="\x1c"}END{print "total chars:" length($0)}' abc
Assuming that file doesn't contain any \x1c
. It would still be invalid either way anyway if it has.
Upvotes: 2