materangai
materangai

Reputation: 109

Use sed (or similar) to remove anything between repeating patterns

I'm essentially trying to "tidy" a lot of data in a CSV. I don't need any of the information that's in "quotes".

Tried sed 's/".*"/""/' but it removes the commas if there's more than one section together.

I would like to get from this:

1,2,"a",4,"b","c",5

To this:

1,2,,4,,,5

Is there a sed wizard who can help? :)

Upvotes: 3

Views: 165

Answers (3)

Cyrus
Cyrus

Reputation: 88829

With Perl:

perl -p -e 's/".*?"//g' file

? forces * to be non-greedy.

Output:

1,2,,4,,,5

Upvotes: 2

RavinderSingh13
RavinderSingh13

Reputation: 133680

Could you please try following.

awk -v s1="\"" 'BEGIN{FS=OFS=","} {for(i=1;i<=NF;i++){if($i~s1){$i=""}}} 1' Input_file

Non-one liner form of solution is:

awk -v s1="\"" '
BEGIN{
  FS=OFS=","
}
{
  for(i=1;i<=NF;i++){
    if($i~s1){
      $i=""
    }
  }
}
1
'  Input_file

Detailed explanation:

awk -v s1="\"" '         ##Starting awk program from here and mentioning variable s1 whose value is "
BEGIN{                   ##Starting BEGIN section of this code here.
  FS=OFS=","             ##Setting field separator and output field separator as comma(,) here.
}
{
  for(i=1;i<=NF;i++){    ##Starting a for loop which traverse through all fields of current line.
    if($i~s1){           ##Checking if current field has " in it if yes then do following.
      $i=""              ##Nullifying current field value here.
    }
  }
}
1                        ##Mentioning 1 will print edited/non-edited line here.
'  Input_file            ##Mentioning Input_file name here.

Upvotes: 2

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627262

You may use

sed 's/"[^"]*"//g' file > newfile

See online sed demo:

s='1,2,"a",4,"b","c",5'
sed 's/"[^"]*"//g' <<< "$s"
# => 1,2,,4,,,5

Details

The "[^"]*" pattern matches ", then 0 or more characters other than ", and then ". The matches are removed since RHS is empty. g flag makes it match all occurrences on each line.

Upvotes: 3

Related Questions