Reputation: 441

Extract first word using regex in grep

I have a large text file containing patterns such as

*pattern1, 34:38,info=a1,signal=s1
*pattern2, 32:38,info=a1,signal=s1
*pattern2,36:38,info=a1,signal=s1
*pattern_4,38:38,info=a1,signal=s1

I want to extract the unique first words before first comma using grep. I tried using grep '^*[A-Za-z]' file.txt | sort --uniq and grep '^*[^,]' file.txt | sort --uniq but not getting the first word only. Can anyone comment?

Upvotes: 0

Answers (4)

Andy Lester

Reputation: 93636

I tried using grep '^*[A-Za-z]' file.txt | sort --uniq

grep by default shows the entire line that it matches. If you want grep to show only what was matched, use the -o option.

grep '^[^,]*' -o file.txt | sort -u

The [^,] means "anything that isn't a comma.

Upvotes: 1

RavinderSingh13

Reputation: 133428

With your shown samples and with GNU awk using gensub you could try following. This will provide unique values in 1st column in whole Input_file.

awk '!seen[$0=gensub(/,.*/,"\\1","1")]++' Input_file

Explanation: Simple explanation would be, using gensub we are getting everything before first comma and then in array we are negating duplicate occurrences in each line as per requirement.

Upvotes: 1

anubhava

Reputation: 784928

To get first word and making it unique, you may use this awk:

awk -F, '!uniq[$1]++ {print $1}' file

*pattern1
*pattern2
*pattern_4

Condition !uniq[$1]++ will return true only when $1 is not found in array uniq. Once we add an element in this array we increment it's value to 1 thus causing !uniq[$1]++ to return false.

{print $1} will be executed only for true case.

Upvotes: 3

choroba

Reputation: 241768

If you know the words are comma separated, just search for anything but comma from the start of each line.

Use the -o to only print the matching part of each line. grep is usually used for filtering, not for extraction, but this option can be used sometimes.

grep -o '^[^,]*' file.txt | sort -u

Upvotes: 3

Extract first word using regex in grep

Answers (4)

Related Questions