dassouki
dassouki

Reputation: 6366

parsing CSV files backwards

I have csv files with the following format:

CSV FILE
"a"             , "b"     , "c" , "d"
hello, world    , 1       , 2   , 3
1,2,3,4,5,6,7   , 2       , 456 , 87
h,1231232,3     , 3       , 45  , 44

The problem is that the first field has commas "," in it. I have no control over file generation, as that's the format I receive them in. Is there a way to read a CSV file backwards, from the end of line to the beginning?

I don't mind writing a little python script to do so, if I’m guided in the right direction.

Upvotes: 3

Views: 2493

Answers (8)

cyberconte
cyberconte

Reputation: 2398

You could always do something with regex's, like (perl regex)

#!/usr/bin/perl

use IO::File;

if (my $file = new IO::File("test.csv"))
{
    foreach my $line (<$file>) {
    $line =~ m/^(.*),(.*?),(.*?),(.*?)$/;
    print "[$1][$2][$3][$4]\n";
    }
} else {
    print "Unable to open test.csv\n";
}

(The first is a greedy search, the last 3 are not) Edit: posted full code instead of just the regex

Upvotes: 1

Greg
Greg

Reputation: 5676

I don't fully understand why you want to read each line in reverse, but you could do this:

import csv
file = open("mycsvfile.csv")
reversedLines = [line[::-1] for line in file]
file.close()
reader = csv.reader(reversedLines)
for backwardRow in reader:
    lastField = backwardRow[0][::-1]
    secondField = backwardRow[1][::-1]

Upvotes: 4

Tim
Tim

Reputation: 20360

I agree with mr beer. That is a badly formed csv file. Your best bet is to find other delimiters or stop overloading the commas or quote/escape the non field separating commas

Upvotes: 0

Toon Krijthe
Toon Krijthe

Reputation: 53426

If you always expect the same number of columns, and only the first column can contain commas, just read anything and concatenate excess columns at the beginning.

The problem is that the interface is ambiguous, and you can try to circumvent this, but the better solution is to try to get the interface fixed (which is often harder than creating several patches...).

Upvotes: 0

Lazarus
Lazarus

Reputation: 43094

That's not then a CSV file, comma separated means just that.

How can you be certain that is not:

CSV FILE
"a"             , "b"     , "c" , "d"
hello           , world   , 1   , 2   , 3
1               , 2       , 3   , 4   , 5,6,7,2,456,87
h               , 1231232 , 3   , 3   , 45,44

If the file is as you indicate then the first group should be surrounded by quotes, looks as though the field names are so odd that fields containing commas are not.

I'm not a fan of fixing errors away from their source, I'd push back to the data generator to deliver proper CSV if that's what they are claiming it is.

Upvotes: 1

mkolodziejski
mkolodziejski

Reputation: 74

From the sample You have provided, it looks like "columns" are fixed size. First (the one with commas) is 16 characters long, so why don't You try reading the file line by line, then for each line reading the first 16 characters (as a value of first column), and the rest accordingly? After You have each value, You can go and parse it further (trim whitespaces, and so on...).

Upvotes: 1

Eli Courtwright
Eli Courtwright

Reputation: 193131

The rsplit string method splits a string starting from the right instead of the left, and so it's probably what you're looking for (it takes an argument specifying the max number of times to split):

line = "hello, world    , 1       , 2   , 3"
parts = line.rsplit(",", 3)
print parts  # prints ['hello, world    ', ' 1       ', ' 2   ', ' 3']

If you want to strip the whitespace from the beginning and end of each item in your splitted list, then you can just use the strip method with a list comprehension

parts = [s.strip() for s in parts]
print parts  # prints ['hello, world', '1', '2', '3']

Upvotes: 16

Kevin LaBranche
Kevin LaBranche

Reputation: 21088

Reverse the string first and then process it.

tmp = tmp[::-1]

Upvotes: 1

Related Questions