JDE876
JDE876

Reputation: 407

How can I remove leading and trailing whitespace from all columns but one in a CSV?

I have a CSV that looks like this:

things,ID,hello_field,more things
stuff,123  ,hello ,more stuff
stuff,123 ,hello ,more stuff
stuff ,123  ,hello ,more stuff
stuff,123  ,hello ,more stuff
stuff ,123,hello ,more stuff
stuff,123,hello ,more stuff
stuff ,123,hello ,more stuff

How can I remove leading and trailing whitespace from all columns except for the second (ID)? The final output would look like this:

things,ID,hello_field,more things
stuff,123  ,hello,more stuff
stuff,123 ,hello,more stuff
stuff,123  ,hello,more stuff
stuff,123  ,hello,more stuff
stuff,123,hello,more stuff
stuff,123,hello,more stuff
stuff,123,hello,more stuff

I tried using the following regex, but it removes spaces from all fields, including those in the ID column.

s/( +,|, +)/,/gi;

Upvotes: 3

Views: 1119

Answers (5)

G. Cito
G. Cito

Reputation: 6378

I prefer @Miller's answer which uses regular expressions as the OP requested, but there is also Text::Trim when needed:

perl -MText::Trim -F, -anE 'trim for @F[0,2..$#F]; say join ",", @F' test.csv

or:

use Text::Trim;
for (<>){
  my @line = split(/,/);
  trim for @line[0,2..$#line];
  print join",", @line, "\n";
}

I hope I'm not hijacking the thread, but I'm trying to explain to myself why Text::Trim works here but String::Util qw/trim/ does not. And, more to the OP's question, why one works just like applying an s// (i.e. an expression) to the iterated value and the other doesn't. I think it has something to do with modifying the original value of the string. i.e the String::Util version of trim is more akin to using the post 5.14 "non destructive substitution flag" a.k.a. "/r": s/^\s+|\s+$//rg where as Text::Trim trims more directly ...

In any case Text::Trim uses this regexp:

s/\A\s+//; s/\s+\z// ;    

(along with wantarray etc.) where as String::Util's trim sub is errm different ... perhaps that's useful here ;-)

Upvotes: 0

David W.
David W.

Reputation: 107040

You can specify each field in the substitution:

#! /usr/bin/env perl
use warnings;
use strict;
use feature qw(say);

for my $line ( <DATA> ) {
    chomp $line;
    $line =~ s/^\s*(\S+)\s*,   # Things: trim off the spaces
        (.+?),                # ID: Leave alone
        \s*(\S+)\s*,          # Hello Field: trim off spaces
        \s*(\S+)\s*           # More things: trim off spaces
        /$1,$2,$3,$4/x;
    say $line;
}

__DATA__
things,ID,hello_field,more things
stuff,123  ,hello ,more stuff
stuff,123 ,hello ,more stuff
stuff ,123  ,hello ,more stuff
stuff,123  ,hello ,more stuff
stuff ,123,hello ,more stuff   
stuff,123,hello ,more stuff
stuff ,123,hello ,more stuff

Here, I'm using x on the end of the regular expression which allows me to break the expression onto multiple lines.

This produces:

things,ID,hello_field,morethings
stuff,123  ,hello,morestuff
stuff,123 ,hello,morestuff
stuff,123  ,hello,morestuff
stuff,123  ,hello,morestuff
stuff,123,hello,morestuff   
stuff,123,hello,morestuff
stuff,123,hello,morestuff

I was thinking about using named capture groups. They're good if you are moving things around and you have lots of capture groups. However, in this case, I don't believe it makes things any easier to read:

#! /usr/bin/env perl
use warnings;
use strict;
use feature qw(say);

for my $line ( <DATA> ) {
    chomp $line;
    $line =~ s/^\s*(?<things>\S+)\s*,       # Things: trim off the spaces
        (?<id>.+?),                         # ID: Leave alone
        \s*(?<hello_field>\S+)\s*,          # Hello Field: trim off spaces
        \s*(?<more_things>\S+)\s*           # More things: trim off spaces
        /$+{things},$+{id},$+{hello_field},$+{more_things}/x;
    say $line;
}

__DATA__
things,ID,hello_field,more things
stuff,123  ,hello ,more stuff
stuff,123 ,hello ,more stuff
stuff ,123  ,hello ,more stuff
stuff,123  ,hello ,more stuff
stuff ,123,hello ,more stuff   
stuff,123,hello ,more stuff
stuff ,123,hello ,more stuff

Upvotes: 0

konsolebox
konsolebox

Reputation: 75458

Using awk:

awk -F, -v OFS=, '{ for (i = 1; i <= NF; ++i) if (i != 2) { sub(/^[ \t]+/, "", $i); sub(/[ \t]+$/, "", $i) } } 1' file

Output:

things,ID,hello_field,more things
stuff,123  ,hello,more stuff
stuff,123 ,hello,more stuff
stuff,123  ,hello,more stuff
stuff,123  ,hello,more stuff
stuff,123,hello,more stuff
stuff,123,hello,more stuff
stuff,123,hello,more stuff

What it does:

  • Sets field separator and output field separator to ,.
  • Traverse the field values. If field number is not 2, trim out leading and trailing spaces.
  • Print.

Upvotes: 0

anurag
anurag

Reputation: 202

Although i have stored the stuffs in the variable, but you can use it as you want. So, try this:

#!/usr/bin/perl
use strict;
use Data::Dumper;

my $str="things,ID,hello_field,more things
stuff,123  ,hello ,more stuff
stuff,123 ,hello ,more stuff
stuff ,123  ,hello ,more stuff
stuff,123  ,hello ,more stuff
stuff ,123,hello ,more stuff
stuff,123,hello ,more stuff
stuff ,123,hello ,more stuff";

$str=join("\n",map{my ($a,$b,$c)=($1,$2,$3) if($_=~/(.*?),(.*?),(.*)/is);$a=~s/^\s*|\s$//sg;$c=~s/\s*,\s*/,/sg;$_=join(",",$a,$b,$c);$_} split /\n/i,$str);

print $str;

Output:

things,ID,hello_field,more things
stuff,123  ,hello,more stuff
stuff,123 ,hello,more stuff
stuff,123  ,hello,more stuff
stuff,123  ,hello,more stuff
stuff,123,hello,more stuff
stuff,123,hello,more stuff
stuff,123,hello,more stuff

Upvotes: -1

Miller
Miller

Reputation: 35198

Split, trim selectively, rejoin

perl -F, -lane 's/^\s+|\s+$//g for @F[0,2..$#F]; print join ",", @F' file.csv

Explanation:

Switches:

  • -F/pattern/: split() pattern for -a switch (//'s are optional)
  • -l: Enable line ending processing
  • -a: Splits the line on space and loads them in an array @F
  • -n: Creates a while(<>){...} loop for each line in your input file.
  • -e: Tells perl to execute the code on command line.

Code:

  • EXPR for @F[0,2..$#F]: Iterate over array slice (skipping 2nd field)
  • s/^\s+|\s+$//g: Remove leading and trailing spaces from fields
  • print join ",", @F: Print the results

Upvotes: 3

Related Questions