Reputation: 441
I have a large text file containing patterns such as
*pattern1, 34:38,info=a1,signal=s1
*pattern2, 32:38,info=a1,signal=s1
*pattern2,36:38,info=a1,signal=s1
*pattern_4,38:38,info=a1,signal=s1
I want to extract the unique first words before first comma using grep
. I tried using grep '^*[A-Za-z]' file.txt | sort --uniq
and grep '^*[^,]' file.txt | sort --uniq
but not getting the first word only. Can anyone comment?
Upvotes: 0
Views: 1636
Reputation: 93636
I tried using
grep '^*[A-Za-z]' file.txt | sort --uniq
grep by default shows the entire line that it matches. If you want grep to show only what was matched, use the -o
option.
grep '^[^,]*' -o file.txt | sort -u
The [^,]
means "anything that isn't a comma.
Upvotes: 1
Reputation: 133428
With your shown samples and with GNU awk
using gensub
you could try following. This will provide unique values in 1st column in whole Input_file.
awk '!seen[$0=gensub(/,.*/,"\\1","1")]++' Input_file
Explanation: Simple explanation would be, using gensub
we are getting everything before first comma and then in array we are negating duplicate occurrences in each line as per requirement.
Upvotes: 1
Reputation: 784928
To get first word and making it unique, you may use this awk
:
awk -F, '!uniq[$1]++ {print $1}' file
*pattern1
*pattern2
*pattern_4
Condition !uniq[$1]++
will return true only when $1
is not found in array uniq
. Once we add an element in this array we increment it's value to 1
thus causing !uniq[$1]++
to return false.
{print $1}
will be executed only for true
case.
Upvotes: 3
Reputation: 241768
If you know the words are comma separated, just search for anything but comma from the start of each line.
Use the -o
to only print the matching part of each line. grep
is usually used for filtering, not for extraction, but this option can be used sometimes.
grep -o '^[^,]*' file.txt | sort -u
Upvotes: 3