Reputation: 1433
Let's say I have the first column of the following dataset in a file and I want to emulate the flag in the second column so I export only that row tied to a flag = 1 (dataset is pre-sorted by the target column):
1 1
1 0
1 0
2 1
2 0
2 0
I could run awk 'NR==1 {print; next} seen[$1]++ {print}' dataset
but would run into a problem for very large files (seen
keeps growing). Is there an alternative to handle this without tracking every single unique value of the target column (here column #1)? Thanks.
Upvotes: 0
Views: 30
Reputation: 206232
So you only have the first column? And would like to generate the second? I think a slightly different awk command could work
awk '{if (last==$1) {flag=0} else {last=$1; flag=1}; print $0,flag}' file.txt
Basically you just check if the first field matches the last one you've seen. Since it's sorted, you don't have to keep track of everything you've seen, only the last one to know if the value is different.
Upvotes: 1
Reputation: 627
Seems like grep would be fine for this:
$ grep " 1" dataset
Upvotes: 0