Reputation: 407
I have a CSV that looks like this:
things,ID,hello_field,more things
stuff,123 ,hello ,more stuff
stuff,123 ,hello ,more stuff
stuff ,123 ,hello ,more stuff
stuff,123 ,hello ,more stuff
stuff ,123,hello ,more stuff
stuff,123,hello ,more stuff
stuff ,123,hello ,more stuff
How can I remove leading and trailing whitespace from all columns except for the second (ID
)? The final output would look like this:
things,ID,hello_field,more things
stuff,123 ,hello,more stuff
stuff,123 ,hello,more stuff
stuff,123 ,hello,more stuff
stuff,123 ,hello,more stuff
stuff,123,hello,more stuff
stuff,123,hello,more stuff
stuff,123,hello,more stuff
I tried using the following regex, but it removes spaces from all fields, including those in the ID
column.
s/( +,|, +)/,/gi;
Upvotes: 3
Views: 1119
Reputation: 6378
I prefer @Miller's answer which uses regular expressions as the OP requested, but there is also Text::Trim
when needed:
perl -MText::Trim -F, -anE 'trim for @F[0,2..$#F]; say join ",", @F' test.csv
or:
use Text::Trim;
for (<>){
my @line = split(/,/);
trim for @line[0,2..$#line];
print join",", @line, "\n";
}
I hope I'm not hijacking the thread, but I'm trying to explain to myself why Text::Trim
works here but String::Util qw/trim/
does not. And, more to the OP's question, why one works just like applying an s//
(i.e. an expression) to the iterated value and the other doesn't. I think it has something to do with modifying the original value of the string. i.e the String::Util
version of trim
is more akin to using the post 5.14 "non destructive substitution flag" a.k.a. "/r"
: s/^\s+|\s+$//rg
where as Text::Trim
trims more directly ...
In any case Text::Trim
uses this regexp:
s/\A\s+//; s/\s+\z// ;
(along with wantarray
etc.) where as String::Util
's trim
sub is errm different ... perhaps that's useful here ;-)
Upvotes: 0
Reputation: 107040
You can specify each field in the substitution:
#! /usr/bin/env perl
use warnings;
use strict;
use feature qw(say);
for my $line ( <DATA> ) {
chomp $line;
$line =~ s/^\s*(\S+)\s*, # Things: trim off the spaces
(.+?), # ID: Leave alone
\s*(\S+)\s*, # Hello Field: trim off spaces
\s*(\S+)\s* # More things: trim off spaces
/$1,$2,$3,$4/x;
say $line;
}
__DATA__
things,ID,hello_field,more things
stuff,123 ,hello ,more stuff
stuff,123 ,hello ,more stuff
stuff ,123 ,hello ,more stuff
stuff,123 ,hello ,more stuff
stuff ,123,hello ,more stuff
stuff,123,hello ,more stuff
stuff ,123,hello ,more stuff
Here, I'm using x
on the end of the regular expression which allows me to break the expression onto multiple lines.
This produces:
things,ID,hello_field,morethings
stuff,123 ,hello,morestuff
stuff,123 ,hello,morestuff
stuff,123 ,hello,morestuff
stuff,123 ,hello,morestuff
stuff,123,hello,morestuff
stuff,123,hello,morestuff
stuff,123,hello,morestuff
I was thinking about using named capture groups. They're good if you are moving things around and you have lots of capture groups. However, in this case, I don't believe it makes things any easier to read:
#! /usr/bin/env perl
use warnings;
use strict;
use feature qw(say);
for my $line ( <DATA> ) {
chomp $line;
$line =~ s/^\s*(?<things>\S+)\s*, # Things: trim off the spaces
(?<id>.+?), # ID: Leave alone
\s*(?<hello_field>\S+)\s*, # Hello Field: trim off spaces
\s*(?<more_things>\S+)\s* # More things: trim off spaces
/$+{things},$+{id},$+{hello_field},$+{more_things}/x;
say $line;
}
__DATA__
things,ID,hello_field,more things
stuff,123 ,hello ,more stuff
stuff,123 ,hello ,more stuff
stuff ,123 ,hello ,more stuff
stuff,123 ,hello ,more stuff
stuff ,123,hello ,more stuff
stuff,123,hello ,more stuff
stuff ,123,hello ,more stuff
Upvotes: 0
Reputation: 75458
awk
:awk -F, -v OFS=, '{ for (i = 1; i <= NF; ++i) if (i != 2) { sub(/^[ \t]+/, "", $i); sub(/[ \t]+$/, "", $i) } } 1' file
things,ID,hello_field,more things
stuff,123 ,hello,more stuff
stuff,123 ,hello,more stuff
stuff,123 ,hello,more stuff
stuff,123 ,hello,more stuff
stuff,123,hello,more stuff
stuff,123,hello,more stuff
stuff,123,hello,more stuff
,
.Upvotes: 0
Reputation: 202
Although i have stored the stuffs in the variable, but you can use it as you want. So, try this:
#!/usr/bin/perl
use strict;
use Data::Dumper;
my $str="things,ID,hello_field,more things
stuff,123 ,hello ,more stuff
stuff,123 ,hello ,more stuff
stuff ,123 ,hello ,more stuff
stuff,123 ,hello ,more stuff
stuff ,123,hello ,more stuff
stuff,123,hello ,more stuff
stuff ,123,hello ,more stuff";
$str=join("\n",map{my ($a,$b,$c)=($1,$2,$3) if($_=~/(.*?),(.*?),(.*)/is);$a=~s/^\s*|\s$//sg;$c=~s/\s*,\s*/,/sg;$_=join(",",$a,$b,$c);$_} split /\n/i,$str);
print $str;
Output:
things,ID,hello_field,more things
stuff,123 ,hello,more stuff
stuff,123 ,hello,more stuff
stuff,123 ,hello,more stuff
stuff,123 ,hello,more stuff
stuff,123,hello,more stuff
stuff,123,hello,more stuff
stuff,123,hello,more stuff
Upvotes: -1
Reputation: 35198
Split, trim selectively, rejoin
perl -F, -lane 's/^\s+|\s+$//g for @F[0,2..$#F]; print join ",", @F' file.csv
Switches:
-F/pattern/
: split()
pattern for -a
switch (//
's are optional)-l
: Enable line ending processing-a
: Splits the line on space and loads them in an array @F
-n
: Creates a while(<>){...}
loop for each line in your input file. -e
: Tells perl
to execute the code on command line. Code:
EXPR for @F[0,2..$#F]
: Iterate over array slice (skipping 2nd field)s/^\s+|\s+$//g
: Remove leading and trailing spaces from fieldsprint join ",", @F
: Print the resultsUpvotes: 3