Nishad
Nishad

Reputation: 426

Counting Lines of file having row delimiters in unix sll script

DataFile content

1234t56
78t7891

here delimiter is t

and i need output as

3 

(the three objects I want counted would be 1234, 56<newline>78 and 7891)

it worked with grep i.e. counting occurrence of delimiter and then add one will give no. of lines

but its performance hindrance anything in awk could help

Upvotes: 0

Views: 1077

Answers (3)

paxdiablo
paxdiablo

Reputation: 882088

Assuming t is your line delimiter as seems to be the case by your phrase "counting occurrence of delimiter and then add one will give no. of lines", one way is to simply delete all characters that aren't the delimiter and count the remaining ones:

pax> ((count = $(echo '1234t5678t7891' | tr -c -d 't' | wc -c)))
pax> ((count++))
pax> echo $count
3

This takes about 24 seconds wall time for a 3.5G file I just happened to have lying around, but only about 6 seconds CPU time:

pax> ll qq2
-rw-r--r-- 1 pax good_lookers 3541710600 Dec 30 16:32 qq2

pax> time ((count = $(tr -c -d 't' <qq2 | wc -c)))
real    0m24.163s
user    0m4.436s
sys     0m2.060s

pax> ((count++)) ; echo $count
10844976

Whether that's fast enough, I couldn't say, since you haven't provided the requirements there. Short of writing a bespoke program utilising things like large buffers, I don't think you'll get much better performance than a pipeline like that.

But, in any case, you should benchmark any potential solution with your own data as well. The primary mantra of optimisation is: measure, don't guess!

Upvotes: 3

user3442743
user3442743

Reputation:

Another awk way for your updated question

awk -vRS='t' 'END{print NR}' file

Upvotes: 4

Arjun Mathew Dan
Arjun Mathew Dan

Reputation: 5298

Something like this:

echo "1234t5678t7891" | awk -F't' '{print NF}'

If processing file contents, u can change it to:

awk -F't' '{print NF}' File

Here, we set the delimiter as 't' (-F't'). Then we print the number of fields (print NF)

For your edited question:

tr -d '\n' < File | awk -F't' '{print NF}'

Upvotes: 3

Related Questions