Reputation: 301
Hello and thank you for taking the time to read this question. For the last day I have been trying to solve a problem and haven’t come any closer to a solution. I have a sample file of data that contains the following:
Fighter@Trainer
Bobby@SamBonen
Billy@BobBrown
Sammy@DJacobson
James@DJacobson
Donny@SonnyG
Ben@JasonS
Dave@JuanO
Derrek@KMcLaughlin
Dillon@LGarmati
Orson@LGarmati
Jeff@RodgerU
Brad@VCastillo
The goal is to identify “Trainers” that have have more then one fighter. My gut feeling is the “getline” and variable declaration directives in AWK are going to be needed. I have tried different combinations of
awk -F@ 'NR>1{a=$2; getline; if($2 = a) {print $0,"Yes"} else {print $0,"NO"}}' sample.txt
Yet, the output is nowhere near the desired results. In fact, it doesn’t even output all the rows in the sample file!
My desired results are:
Fighter@Trainer
Bobby@SamBonen@NO
Billy@BobBrown@NO
Sammy@DJacobson@YES
James@DJacobson@YES
Donny@SonnyG@NO
Ben@JasonS@NO
Dave@JuanO@NO
Derrek@KMcLaughlin@NO
Dillon@LGarmati@YES
Orson@LGarmati@YES
Jeff@RodgerU@NO
Brad@VCastillo@NO
I am completely lost as to where to go from here. I have been searching and trying to find a solution to no avail, and I'm looking for some input. Thank you!
Upvotes: 0
Views: 80
Reputation: 10865
Another option is to make two passes:
$ cat p.awk
BEGIN {FS=OFS="@"}
NR==1 {print;next};
NR==FNR {++trainers[$2]; next}
FNR>1 {$3=(trainers[$2]>1)?"YES":"NO"; print}
$ awk -f p.awk p.txt p.txt
Fighter@Trainer
Bobby@SamBonen@NO
Billy@BobBrown@NO
Sammy@DJacobson@YES
James@DJacobson@YES
Donny@SonnyG@NO
Ben@JasonS@NO
Dave@JuanO@NO
Derrek@KMcLaughlin@NO
Dillon@LGarmati@YES
Orson@LGarmati@YES
Jeff@RodgerU@NO
Brad@VCastillo@NO
Explained:
Set the input and output file separators:
BEGIN {FS=OFS="@"}
Print the header:
NR==1 {print;next};
First pass, count occurrences of each trainer:
NR==FNR {++trainers[$2]; next}
Second pass, set YES or NO according to trainer count, and print result:
FNR>1 {$3=(trainers[$2]>1)?"YES":"NO"; print}
Upvotes: 1
Reputation: 124646
You don't need getline
.
You could just process the input normally,
building up counts per trainer,
and print the result in an END
block:
awk -F@ '{
lines[NR] = $0;
trainers[NR] = $2;
counts[$2]++;
}
END {
print lines[1];
for (i = 2; i <= length(lines); i++) {
print lines[i] "@" (counts[trainers[i]] > 1 ? "YES" : "NO");
}
}' sample.txt
Upvotes: 4