Greg Mathews
Greg Mathews

Reputation: 33

Insert text into line if that line doesn't contain another string using sed

I am merging a number of text files on a linux server but the lines in some differ slightly and I need to unify them.

For example some files will have line like

id='1244' group='american' name='fred',american

Other files will be like

id='2345' name='frank', english

finally others will be like

id='7897' group='' name='maria',scottish

what I need to do is, if group='' or group is not in the string at all I need to add it somewhere before the comma setting it to the text after the comma so in the 2nd example above the line would become:

id='2345' name='frank' group='english',english

and the same in the last example which would become

id='7897' name='maria' group='scottish',scottish

This is going into a bash script. I can't actually delete the line and add to the end of the file as it relates to the following line.

I've used the following:

sed -i.bak 's#group=""##' file 

which deletes the group="" string so the lines will either contain group='something' or wont contain it at all and that works

Then I tried to add the group if it doesn't exist using the following:

sed -i.bak '/group/! s#,(.*$)#group="\1",\1#' file

but that throws up the error

sed: -e expression #1, char 38: invalid reference \1 on `s' command's RHS

EDIT by Ed Morton to create a single sample input file and expected output:

Sample Input:

id='1244' group='american' name='fred',american
foo
id='2345' name='frank', english
bar
id='7897' group='' name='maria',scottish

Expected Output:

id='1244' group='american' name='fred',american
foo
id='2345' name='frank' group='english',english
bar
id='7897' name='maria' group='scottish',scottish

Upvotes: 3

Views: 655

Answers (4)

glenn jackman
glenn jackman

Reputation: 246807

sed -r "
    /group=''/ s///                                   # group is empty, remove it
    /group=/!  s/,[[:blank:]]*(.+)/ group='\\1',\\1/  # group is missing, add it
" file
id='1244' group='american' name='fred',american
foo
id='2345' name='frank' group='english',english
bar
id='7897'  name='maria' group='scottish',scottish

The foo and bar lines are untouched because the s/// command did not match a comma followed by characters.

Upvotes: 1

ghoti
ghoti

Reputation: 46846

Sed can be pretty ugly. And your data format appears to be somewhat inconsistent. This MIGHT work for you:

$ sed -e "/group='[a-z]/b e" -e "s/group='' *//" -e "s/,\([a-z]*\)$/ group='\1', /" -e ':e' input.txt

Broken out for easier reading, here's what we're doing:

  • /group='[a-z]/b e - If the line contains a valid group, branch to the end.
  • s/group='' *// - Remove any empty group,
  • s/,\([a-z]*\)$/ group='\1', / - add a new group based on your specs
  • :e - branch label for the first command.
  • And then the default action is to print the line.

I really don't like manipulating data this way. It's prone to error, and you'll be further ahead reading this data into something that accurately stores its data structure, then prints the data according to a new structure. A more robust solution would likely be tied directly to whatever is producing or consuming this data, and would not sit in the middle like this.

Upvotes: 0

oliv
oliv

Reputation: 13249

This GNU awk may help:

awk -v sq="'" '
  BEGIN{RS="[ ,\n]+"; FS="="; found=0}
  $1=="group"{
    if($2==sq sq) 
      {next}
    else
      {found=1}
  }
  NF>1{
    printf "%s=%s ",$1,$2
  }
  NF==1{
    if(!found)
      {printf "group=%s",$1}
    print ","$1
    found=0
  }
' file

The script relies on the record separator RS which is set to get all key='value' pairs.

If the key group isn't found or is empty, it is printed when reaching a record with only one field.

Note that the variable sq holds the single quote character and is used to detect empty group field.

Upvotes: 0

Nahuel Fouilleul
Nahuel Fouilleul

Reputation: 19315

something like

sed  '
    /^[^,]*group[^,]*,/ ! {
        s/, *\(.*\)/ group='\''\1'\'', \1/
    }
    /^[^,]*group='\'\''/ {
        s/group='\'\''\([^,]*\), *\(.*\)/group='\''\2'\''\1, \2/
    }
'

Upvotes: 1

Related Questions