Reputation: 11294
Given a file with data like this (ie stores.dat file)
sid|storeNo|latitude|longitude
2tt|1|-28.0372000t0|153.42921670
9|2t|-33tt.85t09t0000|15t1.03274200
What is the command that would return the number of occurrences of the 't' character per line?
eg. would return:
count lineNum
4 1
3 2
6 3
Also, to do it by count of occurrences by field what is the command to return the following results?
eg. input of column 2 and character 't'
count lineNum
1 1
0 2
1 3
eg. input of column 3 and character 't'
count lineNum
2 1
1 2
4 3
Upvotes: 55
Views: 78919
Reputation: 71
perl -e 'while(<>) { $count = tr/t//; print "$count ".++$x."\n"; }' stores.dat
Another perl answer yay! The tr/t// function returns the count of the number of times the translation occurred on that line, in other words the number of times tr found the character 't'. ++$x maintains the line number count.
Upvotes: 1
Reputation: 31
$ cat -n test.txt
1 test 1
2 you want
3 void
4 you don't want
5 ttttttttttt
6 t t t t t t
$ awk '{n=split($0,c,"t")-1;if (n!=0) print n,NR}' test.txt
2 1
1 2
2 4
11 5
6 6
Upvotes: 3
Reputation: 3678
To count occurences of a character per line:
$ awk -F 't' '{print NF-1, NR}' input.txt
4 1
3 2
6 3
this sets field separator to the character that needs to be counted, then uses the fact that number of fields is one greater than number of separators.
To count occurences in a particular column cut
out that column first:
$ cut -d '|' -f 2 input.txt | awk -F 't' '{print NF-1, NR}'
1 1
0 2
1 3
$ cut -d '|' -f 3 input.txt | awk -F 't' '{print NF-1, NR}'
2 1
1 2
4 3
Upvotes: 28
Reputation: 77095
To count occurrence of a character per line you can do:
awk -F'|' 'BEGIN{print "count", "lineNum"}{print gsub(/t/,"") "\t" NR}' file
count lineNum
4 1
3 2
6 3
To count occurrence of a character per field/column you can do:
column 2:
awk -F'|' -v fld=2 'BEGIN{print "count", "lineNum"}{print gsub(/t/,"",$fld) "\t" NR}' file
count lineNum
1 1
0 2
1 3
column 3:
awk -F'|' -v fld=3 'BEGIN{print "count", "lineNum"}{print gsub(/t/,"",$fld) "\t" NR}' file
count lineNum
2 1
1 2
4 3
gsub()
function's return value is number of substitution made. So we use that to print the number. NR
holds the line number so we use it to print the line number. fld
and put the field number we wish to extract counts from. Upvotes: 53
Reputation: 706
awk '{gsub("[^t]",""); print length($0),NR;}' stores.dat
The call to gsub() deletes everything in the line that is not a t, then just print the length of what remains, and the current line number.
Want to do it just for column 2?
awk 'BEGIN{FS="|"} {gsub("[^t]","",$2); print NR,length($2);}' stores.dat
Upvotes: 3
Reputation: 10314
You could also split the line or field with "t" and check the length of the resulting array - 1. Set the col
variable to 0 for the line or 1 through 3 for columns:
awk -F'|' -v col=0 -v OFS=$'\t' 'BEGIN {
print "count", "lineNum"
}{
split($col, a, "t"); print length(a) - 1, NR
}
' stores.dat
Upvotes: 3
Reputation: 837
grep -n -o "t" stores.dat | sort -n | uniq -c | cut -d : -f 1
gives almost exactly the output you want:
4 1
3 2
6 3
Thanks to @raghav-bhushan for the grep -o
hint, what a useful flag. The -n flag includes the line number as well.
Upvotes: 52
Reputation: 16750
No need for awk or perl, only with bash and standard Unix utilities:
cat file | tr -c -d "t\n" | cat -n |
{ echo "count lineNum"
while read num data; do
test ${#data} -gt 0 && printf "%4d %5d\n" ${#data} $num
done; }
And for a particular column:
cut -d "|" -f 2 file | tr -c -d "t\n" | cat -n |
{ echo -e "count lineNum"
while read num data; do
test ${#data} -gt 0 && printf "%4d %5d\n" ${#data} $num
done; }
And we can even avoid tr
and the cat
s:
echo "count lineNum"
num=1
while read data; do
new_data=${data//t/}
count=$((${#data}-${#new_data}))
test $count -gt 0 && printf "%4d %5d\n" $count $num
num=$(($num+1))
done < file
and event the cut:
echo "count lineNum"
num=1; OLF_IFS=$IFS; IFS="|"
while read -a array_data; do
data=${array_data[1]}
new_data=${data//t/}
count=$((${#data}-${#new_data}))
test $count -gt 0 && printf "%4d %5d\n" $count $num
num=$(($num+1))
done < file
IFS=$OLF_IFS
Upvotes: 4
Reputation: 486
cat stores.dat | awk 'BEGIN {FS = "|"}; {print $1}' | awk 'BEGIN {FS = "\t"}; {print NF}'
Where $1
would be a column number you want to count.
Upvotes: 1
Reputation: 36262
One possible solution using perl
:
Content of script.pl:
use warnings;
use strict;
## Check arguments:
## 1.- Input file
## 2.- Char to search.
## 3.- (Optional) field to search. If blank, zero or bigger than number
## of columns, default to search char in all the line.
(@ARGV == 2 || @ARGV == 3) or die qq(Usage: perl $0 input-file char [column]\n);
my ($char,$column);
## Get values or arguments.
if ( @ARGV == 3 ) {
($char, $column) = splice @ARGV, -2;
} else {
$char = pop @ARGV;
$column = 0;
}
## Check that $char must be a non-white space character and $column
## only accept numbers.
die qq[Bad input\n] if $char !~ m/^\S$/ or $column !~ m/^\d+$/;
print qq[count\tlineNum\n];
while ( <> ) {
## Remove last '\n'
chomp;
## Get fields.
my @f = split /\|/;
## If column is a valid one, select it to the search.
if ( $column > 0 and $column <= scalar @f ) {
$_ = $f[ $column - 1];
}
## Count.
my $count = eval qq[tr/$char/$char/];
## Print result.
printf qq[%d\t%d\n], $count, $.;
}
The script accepts three parameters:
Running the script without arguments:
perl script.pl
Usage: perl script.pl input-file char [column]
With arguments and its output:
Here 0 is a bad column, it searches all the line.
perl script.pl stores.dat 't' 0
count lineNum
4 1
3 2
6 3
Here it searches in column 1.
perl script.pl stores.dat 't' 1
count lineNum
0 1
2 2
0 3
Here it searches in column 3.
perl script.pl stores.dat 't' 3
count lineNum
2 1
1 2
4 3
th
is not a char.
perl script.pl stores.dat 'th' 3
Bad input
Upvotes: 4