user1742835
user1742835

Reputation: 165

How do i search & replace using sed and not include a group of characters?

Hello in the following sed command i need to have in the second group of parenthesis code that will NOT accept the following group of words: Inc The Ltd LLC

It will break the following data in list.txt to have each company name on a line, the company names are after commas but some times "Inc", "Ltd", "LLC", and "The" follow a company.

This is pretty advance regular expression that i cant seem to get.

sed -re 's/([a-zA-Z.]), (Need code here)/\1\n\2/g' list.txt

list.txt has the following data:

Electronic Arts, Inc., Electronic Arts Ltd.
Activision Publishing, Inc., ak tronic Software & Services GmbH
Coplin Software
Electronic Arts, Inc.
Electronic Arts, Inc.
In-Fusio
Activision Publishing, Inc.
Domark Ltd.
Electronic Arts, Inc.
Electronic Arts, Inc.
Aspyr Media, Inc., Electronic Arts, Inc.
Activision Deutschland GmbH, Activision Publishing, Inc., ak tronic Software & Services GmbH, Noviy Disk, Square Enix Co., Ltd.
Electronic Arts, Inc.
Electronic Arts, Inc., Electronic Arts Ltd.
Electronic Arts, Inc.
Electronic Arts, Inc.
Electronic Arts, Inc., Electronic Arts Square, K.K., MGM Interactive
Electronic Arts Ltd.

expected output(notice the commas):

GarageGames, Inc.
The Avalon Hill Game Company
Microforum International, The
Telenet Japan Co., Ltd.
Glu Mobile, Inc.
Warner Bros. Digital Distribution
Atari, Inc.

Upvotes: 3

Views: 337

Answers (4)

Pedro Lobito
Pedro Lobito

Reputation: 98921

Based on your example list.txt, you can try this:

  sed -re 's/(, )?(Inc.|The|Ltd.?|LLC)//g' list.txt| tr ',' '\n' | sed -re 's/(.*)/\1/g' | sed -re '/^\s*$/d' | sed -re 's/(^ | $)//g'

OUTPUTS:

Electronic Arts
Electronic Arts
Activision Publishing
ak tronic Software & Services GmbH
Coplin Software
Electronic Arts
Electronic Arts
In-Fusio
Activision Publishing
Domark
Electronic Arts
Electronic Arts
Aspyr Media
Electronic Arts
Activision Deutschland GmbH
Activision Publishing
ak tronic Software & Services GmbH
Noviy Disk
Square Enix Co.
Electronic Arts
Electronic Arts
Electronic Arts
Electronic Arts
Electronic Arts
Electronic Arts
Electronic Arts Square
K.K.
MGM Interactive

NOTE:

You can pipe the above list to awk and display only unique results, ex:

sed -re 's/(, )?(Inc.|The|Ltd.?|LLC)//g' list.txt| tr ',' '\n' | sed -re 's/(.*)/\1/g' | sed -re '/^\s*$/d' | sed -re 's/(^ | $)//g'| awk '!seen[$0]++'

Outputs:

Electronic Arts
Activision Publishing
ak tronic Software & Services GmbH
Coplin Software
In-Fusio
Domark
Aspyr Media
Activision Deutschland GmbH
Noviy Disk
Square Enix Co.
Electronic Arts Square
K.K.
MGM Interactive

Upvotes: 3

hwnd
hwnd

Reputation: 70732

perl -pe 's/([^,]), (?!Inc|LLC|The|Ltd)/\1\n/g' list.txt

Upvotes: 3

jthill
jthill

Reputation: 60295

sed -nr '/^ *([^,]+(, *(Inc\.?|The|Ltd\.?|LLC))?)(,(.*))?/ {
                   s//\1\n\5/
                   P
                   D
}'             

Upvotes: 1

Adrian Frühwirth
Adrian Frühwirth

Reputation: 45576

A perl version:

$ perl -anlF'(?!,[\x20](?:Inc|Ltd|LLC|The).?),[\x20]' -e '$n{$_}++ for @F; END { print join "\n", sort keys %n; }' test.txt
Activision Deutschland GmbH
Activision Publishing, Inc.
Aspyr Media, Inc.
Coplin Software
Domark Ltd.
Electronic Arts Ltd.
Electronic Arts Square
Electronic Arts, Inc.
In-Fusio
K.K.
MGM Interactive
Noviy Disk
Square Enix Co., Ltd.
ak tronic Software & Services GmbH

Upvotes: 0

Related Questions