Reputation:
Given a txt file, that has the following values:
123
123
234
234
123
345
I use
sort FILE | uniq -cd
in order to get the number of counts each value is found. But how I could output also the row it was found?
Output:
123 3 0;1;4
234 2 2;3
The row count is zero based, thus the above numbers.
Upvotes: 1
Views: 75
Reputation: 88601
awk '
{
frequency[$1]++
if (line[$1]=="")
{
line[$1]=NR-1
}
else
{
line[$1]=line[$1]";"NR-1
}
}
END{
for (j in frequency)
if (frequency[j]>1)
print j, frequency[j], line[j]
}' file
$1
: content of first column
NR
: current line number
Output:
234 2 2;3 123 3 0;1;4
Upvotes: 0
Reputation: 77251
I know the question is tagged awk/sed, but for the sake of comparison Look how much verbose is the Python version:
import sys
dictionary = {}
for i, line in enumerate(sys.stdin):
dictionary.setdefault(line.strip(), []).append(str(i))
for value, lines_numbers in dictionary.items():
print(value, len(line_numbers), ";".join(line_numbers))
Testing:
$ python script.py < FILE
123 3 0;1;4
234 2 2;3
345 1 5
Upvotes: 0
Reputation: 92854
awk solution:
awk '{ a[$1]=($1 in a? a[$1]";":"")(NR-1); cnt[$1]++ }
END{ for(i in a) if(a[i]~/;/) { print i,cnt[i],a[i] } }' file
a[$1]=($1 in a? a[$1]";":"")(NR-1)
- accumulating row numbers (starting from 0
) for each grouped value $1
via concatenating multiple occurrences with ;
cnt[$1]++
- count numbers of value occurrences
The output:
123 3 0;1;4
234 2 2;3
Upvotes: 1