Reputation: 15683
In a two-column CSV, I want to check the successive numbers (the values are integers):
1,1
4,1
5,1
6,1
7,1
10,1
11,1
4,2
5,2
6,2
10,2
5,3
5,7
5,8
5,9
I tried
awk -F, '
{
if($1==a+1) { n+=1 }
else { n = 1 }
{a=$1}
} {print $1"," n}
' file.txt
to produce
1,1,1
4,1,1
5,1,2
6,1,3
7,1,4
10,1,1
11,1,2
4,2,1
5,2,2
6,2,3
10,2,1
5,3,1
5,7,1
5,8,1
5,9,1
where the third column counts the successive rows with respect to the first column.
This is the expected output including the fourth column.
1,1,1,1
4,1,1,1
5,1,2,1
6,1,3,1
7,1,4,1
10,1,1,1
11,1,2,1
4,2,1,1
5,2,2,1
6,2,3,1
10,2,1,1
5,3,1,1
5,7,1,1
5,8,1,2
5,9,1,3
My idea is to sort the produced 3-column CSV (sort -t, -k1,1n -k2,2n
) and do the same thing with the second column, but I am not sure.
EDIT: I made a mistake in the first dataset. When we count the first column, the second column is constant, and vice versa.
Upvotes: 0
Views: 54
Reputation: 785721
This awk
should work for you:
awk 'BEGIN {
FS=OFS=","
}
{
$3 = ($1 == p1+1 ? $3+1 : 1)
$4 = ($1 == p1 && $2 == p2+1 ? $4+1 : 1)
}
{
p1 = $1
p2 = $2
} 1' file.csv
1,1,1,1
4,1,1,1
5,1,2,1
6,1,3,1
7,1,4,1
10,1,1,1
11,1,2,1
4,2,1,1
5,2,2,1
6,2,3,1
10,2,1,1
5,3,1,1
5,7,1,1
5,8,1,2
5,9,1,3
Upvotes: 3
Reputation: 133680
Could you please try following, written and tested on shown samples(basically re-designed OP's code in more awkish way :) , thanks to OP for letting know in comments).
awk '
BEGIN{
FS=OFS=","
}
{
count=$1-prev==1 && prev?++count:"1"
print $0,count
prev=$1
}
' Input_file
Explanation:
Output will be as follows.
1,10,1
4,10,1
5,10,2
6,11,3
7,14,4
10,9,1
11,2,2
NOTE: In case you want to sort output after awk's calculations then we could further use | sort
to perform sorting on 3rd column IMHO.
Upvotes: 2