Reputation: 426
DataFile content
1234t56
78t7891
here delimiter is t
and i need output as
3
(the three objects I want counted would be 1234
, 56<newline>78
and 7891
)
it worked with grep i.e. counting occurrence of delimiter and then add one will give no. of lines
but its performance hindrance anything in awk could help
Upvotes: 0
Views: 1077
Reputation: 882088
Assuming t
is your line delimiter as seems to be the case by your phrase "counting occurrence of delimiter and then add one will give no. of lines", one way is to simply delete all characters that aren't the delimiter and count the remaining ones:
pax> ((count = $(echo '1234t5678t7891' | tr -c -d 't' | wc -c)))
pax> ((count++))
pax> echo $count
3
This takes about 24 seconds wall time for a 3.5G file I just happened to have lying around, but only about 6 seconds CPU time:
pax> ll qq2
-rw-r--r-- 1 pax good_lookers 3541710600 Dec 30 16:32 qq2
pax> time ((count = $(tr -c -d 't' <qq2 | wc -c)))
real 0m24.163s
user 0m4.436s
sys 0m2.060s
pax> ((count++)) ; echo $count
10844976
Whether that's fast enough, I couldn't say, since you haven't provided the requirements there. Short of writing a bespoke program utilising things like large buffers, I don't think you'll get much better performance than a pipeline like that.
But, in any case, you should benchmark any potential solution with your own data as well. The primary mantra of optimisation is: measure, don't guess!
Upvotes: 3
Reputation:
Another awk way for your updated question
awk -vRS='t' 'END{print NR}' file
Upvotes: 4
Reputation: 5298
Something like this:
echo "1234t5678t7891" | awk -F't' '{print NF}'
If processing file contents, u can change it to:
awk -F't' '{print NF}' File
Here, we set the delimiter as 't' (-F't')
. Then we print the number of fields (print NF)
For your edited question:
tr -d '\n' < File | awk -F't' '{print NF}'
Upvotes: 3