user2757192
user2757192

Reputation: 115

Delete specific words + symbol in Bash

I have a list of MAC vendors and I need to parse the text to delete information not necesary.

If I have this

F8FEA8 Technico # Technico Japan Corporation
F8FF5F Shenzhen # Shenzhen Communication Technology Co.,Ltd
FC0012 ToshibaS # Toshiba Samsung Storage Technolgoy Korea Corporation
FC019E Vievu
FC01CD Fundacio # FUNDACION TEKNIKER
FC0647 Cortland # Cortland Research, LLC
FC0877 PrentkeR
FC0A81 Motorola # Motorola Solutions Inc.

I need to delete all [space][word][#] to have this

F8FEA8 Technico Japan Corporation
F8FF5F Shenzhen Communication Technology Co.,Ltd
FC0012 Toshiba Samsung Storage Technolgoy Korea Corporation
FC019E Vievu
FC01CD FUNDACION TEKNIKER
FC0647 Cortland Research, LLC
FC0877 PrentkeR
FC0A81 Motorola Solutions Inc.

Can it be done with grep or sed ? :S

Sorry for my bad english

Upvotes: 1

Views: 158

Answers (5)

Jotne
Jotne

Reputation: 41456

More awk

awk -F" # [^ ]+ " '{$1=$1}1' file # more robust
awk -F" # [^ ]+ " '$1=$1' file    # some dangerous, do not use if $1=0

This sets the field separator equal to what we like to remove then print the rest.

awk '{sub(/ # [^ ]+/,x)}1' file

This just remove what we do not want.

Upvotes: 2

potong
potong

Reputation: 58401

This may work for you (GNU sed):

sed -ri 's/\s\S+\s#//' file

or:

sed -i 's/ [^ ][^ ]* #//' file

Which mean: Look for a space followed by one or more non-spaces, followed by another space, followed by a # and then delete that expression. The file is update in place which is what the -i option means.The -r option in the first solution, allows syntatic sugar to be used, in this case the allows you to write \S+ instead of \S\+ or [^ ][^ ]*.

Upvotes: 4

technosaurus
technosaurus

Reputation: 7802

Here is a shell only solution:

while read A B C D;do
  [ "$C" == "#" ] && echo "$A $D" || echo "$A $B $C $D"
done < infile.txt >outfile.txt

Upvotes: 4

iruvar
iruvar

Reputation: 23364

Assuming # stands by itself in field 3 when it occurs, the following solution may work

awk '$3 == "#"{t=$1; $1=$2=$3=""; sub(/^[[:space:]]+/, ""); $0=t" "$0}; 
     {print}' file.txt

Upvotes: 2

Birei
Birei

Reputation: 36262

It's seems an easy parsing. Here a solution using . It splits line in fields based in white spaces and if the third one is # remove it and the previous one:

perl -lane 'if ( $F[2] eq q|#| ) { @F = @F[0,3..$#F] }; print qq|@F|' infile

It yields:

F8FEA8 Technico Japan Corporation
F8FF5F Shenzhen Communication Technology Co.,Ltd
FC0012 Toshiba Samsung Storage Technolgoy Korea Corporation
FC019E Vievu
FC01CD FUNDACION TEKNIKER
FC0647 Cortland Research, LLC
FC0877 PrentkeR
FC0A81 Motorola Solutions Inc.

Upvotes: 2

Related Questions