Angelo
Angelo

Reputation: 5059

removing double quotes from a file

I have a tab delimited file, that looks like this.

"""chr1"    "38045559"  "38046059"  "C1orf122"""
""""    ""  ""  "C1orf122"""
""""    ""  ""  "YRDC"""
"""chr1"    "205291045" "205291545" "YOD1"""
"""chr1"    "1499717"   "1500625"   "SSU72"""

I got this file after converting a .csv to tab separated file from this command

perl -lpe 's/"/""/g; s/^|$/"/g; s/","/\t/g' <test.csv>test_tab

Now, I want my file to remain tab separated but all the extra quotes should be removed from the file. But at the same time when I print column 4 I should get all the names and for column 1,2, and 3 the co ordinates (this I still get it but with quotes).

What manipulation shall I do in above command to do so, kindly guide.

The output desired is (since I was asked to be clear)

chr1    38045559    38046059    C1orf122
                                C1orf122
                                YRDC
chr1    205291045   205291545   YOD1
chr1    1499717     1500625     SSU72

so that when I extract Column 4 I should get

    C1orf122
    C1orf122
    YRDC 
    YOD1
    SSU72

Thank you

Upvotes: 0

Views: 2502

Answers (1)

dan1111
dan1111

Reputation: 6566

It appears that most of those quotes are being inserted by your command to bring in the file. Instead open the file normally:

use strict;
use warnings;

open CSV, 'test.csv' or die "can't open input file.";
open TAB, '>test.tab' or die "can't open output file.";

my @row_array;

while (<CSV>)
{
    #Remove any quotes that exist on the line (it is in default variable $_).
    s/"//g;

    #Split the current row into an array.
    my @fields = split /,/; 

    #write the output, tab-delimited file.
    print TAB join ("\t", @fields) . "\n";

    #Put the row into a multidimensional array.
    push @row_array, \@fields;
}

print "Column 4:\n";
print $_->[3] . "\n" foreach (@row_array);

print "\nColumns 1-3:\n";
print "@{$_}[0..2]\n" foreach (@row_array);

Any quotes that still do exist will be removed by s/"//g; in the above code. This will remove all quotes; it doesn't check whether they are at the beginning and end of a field. If you might have some quotes within the data that you need to preserve, you would need a more sophisticated matching pattern.

Update: I added code to create a tab-separated output file, since you seem to want that. I don't understand exactly what your requirement related to getting "all the names...and the coordinates" is. However, you should be able to use the above code for that. Just add what you need where it says "do stuff". You can reference, for example, column 1 with $fields[0].

Update 2: Added code to extract column 4, then columns 1-3. The syntax for using multidimensional arrays is tricky. See perldsc and perlref for more information.

Update 3: Added code to remove the quotes that still exist in your file.

Upvotes: 2

Related Questions