Atari911
Atari911

Reputation: 183

sed - Remove all of a line except matching pattern

I am working at trying to parse out hashtags from a file. For instance:

Some text here #Foo Some other text here....

I would like the output to be:

#Foo

The text before and after the # can change and I'm trying to apply this to multiple lines of the file. Every line will have a # in it as I already grep'd the file for hashtags.

Basically I'm trying to create a list of the hashtags that are contained in a file. If there is also a way to remove duplicated tags from the resulting output that would be a bonus.

Upvotes: 2

Views: 4274

Answers (2)

heemayl
heemayl

Reputation: 42117

With sed:

sed -E 's/^[^#]*(#[^[:blank:]]*).*/\1/'
  • ^[^#]* matches the portion before first #

  • (#[^[:blank:]]*) matches the # followed by any number of non-space/tab characters, and put the match in captured group 1

  • .* matches the rest

  • In the replacement, the captured group \1 is used

Example:

% sed -E 's/^[^#]*(#[^[:blank:]]*).*/\1/' <<<'Some text here #Foo Some other text here'
#Foo

Upvotes: 1

Cyrus
Cyrus

Reputation: 88899

With GNU grep:

grep -o '#[^ ]*' file

Upvotes: 3

Related Questions