Reputation: 5059
I have a tab delimited file, that looks like this.
"""chr1" "38045559" "38046059" "C1orf122"""
"""" "" "" "C1orf122"""
"""" "" "" "YRDC"""
"""chr1" "205291045" "205291545" "YOD1"""
"""chr1" "1499717" "1500625" "SSU72"""
I got this file after converting a .csv to tab separated file from this command
perl -lpe 's/"/""/g; s/^|$/"/g; s/","/\t/g' <test.csv>test_tab
Now, I want my file to remain tab separated but all the extra quotes should be removed from the file. But at the same time when I print column 4 I should get all the names and for column 1,2, and 3 the co ordinates (this I still get it but with quotes).
What manipulation shall I do in above command to do so, kindly guide.
The output desired is (since I was asked to be clear)
chr1 38045559 38046059 C1orf122
C1orf122
YRDC
chr1 205291045 205291545 YOD1
chr1 1499717 1500625 SSU72
so that when I extract Column 4 I should get
C1orf122
C1orf122
YRDC
YOD1
SSU72
Thank you
Upvotes: 0
Views: 2502
Reputation: 6566
It appears that most of those quotes are being inserted by your command to bring in the file. Instead open the file normally:
use strict;
use warnings;
open CSV, 'test.csv' or die "can't open input file.";
open TAB, '>test.tab' or die "can't open output file.";
my @row_array;
while (<CSV>)
{
#Remove any quotes that exist on the line (it is in default variable $_).
s/"//g;
#Split the current row into an array.
my @fields = split /,/;
#write the output, tab-delimited file.
print TAB join ("\t", @fields) . "\n";
#Put the row into a multidimensional array.
push @row_array, \@fields;
}
print "Column 4:\n";
print $_->[3] . "\n" foreach (@row_array);
print "\nColumns 1-3:\n";
print "@{$_}[0..2]\n" foreach (@row_array);
Any quotes that still do exist will be removed by s/"//g;
in the above code. This will remove all quotes; it doesn't check whether they are at the beginning and end of a field. If you might have some quotes within the data that you need to preserve, you would need a more sophisticated matching pattern.
Update: I added code to create a tab-separated output file, since you seem to want that. I don't understand exactly what your requirement related to getting "all the names...and the coordinates" is. However, you should be able to use the above code for that. Just add what you need where it says "do stuff". You can reference, for example, column 1 with $fields[0]
.
Update 2: Added code to extract column 4, then columns 1-3. The syntax for using multidimensional arrays is tricky. See perldsc and perlref for more information.
Update 3: Added code to remove the quotes that still exist in your file.
Upvotes: 2