Reputation: 6366
I have csv files with the following format:
CSV FILE
"a" , "b" , "c" , "d"
hello, world , 1 , 2 , 3
1,2,3,4,5,6,7 , 2 , 456 , 87
h,1231232,3 , 3 , 45 , 44
The problem is that the first field has commas "," in it. I have no control over file generation, as that's the format I receive them in. Is there a way to read a CSV file backwards, from the end of line to the beginning?
I don't mind writing a little python script to do so, if I’m guided in the right direction.
Upvotes: 3
Views: 2493
Reputation: 2398
You could always do something with regex's, like (perl regex)
#!/usr/bin/perl
use IO::File;
if (my $file = new IO::File("test.csv"))
{
foreach my $line (<$file>) {
$line =~ m/^(.*),(.*?),(.*?),(.*?)$/;
print "[$1][$2][$3][$4]\n";
}
} else {
print "Unable to open test.csv\n";
}
(The first is a greedy search, the last 3 are not) Edit: posted full code instead of just the regex
Upvotes: 1
Reputation: 5676
I don't fully understand why you want to read each line in reverse, but you could do this:
import csv
file = open("mycsvfile.csv")
reversedLines = [line[::-1] for line in file]
file.close()
reader = csv.reader(reversedLines)
for backwardRow in reader:
lastField = backwardRow[0][::-1]
secondField = backwardRow[1][::-1]
Upvotes: 4
Reputation: 20360
I agree with mr beer. That is a badly formed csv file. Your best bet is to find other delimiters or stop overloading the commas or quote/escape the non field separating commas
Upvotes: 0
Reputation: 53426
If you always expect the same number of columns, and only the first column can contain commas, just read anything and concatenate excess columns at the beginning.
The problem is that the interface is ambiguous, and you can try to circumvent this, but the better solution is to try to get the interface fixed (which is often harder than creating several patches...).
Upvotes: 0
Reputation: 43094
That's not then a CSV file, comma separated means just that.
How can you be certain that is not:
CSV FILE
"a" , "b" , "c" , "d"
hello , world , 1 , 2 , 3
1 , 2 , 3 , 4 , 5,6,7,2,456,87
h , 1231232 , 3 , 3 , 45,44
If the file is as you indicate then the first group should be surrounded by quotes, looks as though the field names are so odd that fields containing commas are not.
I'm not a fan of fixing errors away from their source, I'd push back to the data generator to deliver proper CSV if that's what they are claiming it is.
Upvotes: 1
Reputation: 74
From the sample You have provided, it looks like "columns" are fixed size. First (the one with commas) is 16 characters long, so why don't You try reading the file line by line, then for each line reading the first 16 characters (as a value of first column), and the rest accordingly? After You have each value, You can go and parse it further (trim whitespaces, and so on...).
Upvotes: 1
Reputation: 193131
The rsplit
string method splits a string starting from the right instead of the left, and so it's probably what you're looking for (it takes an argument specifying the max number of times to split):
line = "hello, world , 1 , 2 , 3"
parts = line.rsplit(",", 3)
print parts # prints ['hello, world ', ' 1 ', ' 2 ', ' 3']
If you want to strip the whitespace from the beginning and end of each item in your splitted list, then you can just use the strip
method with a list comprehension
parts = [s.strip() for s in parts]
print parts # prints ['hello, world', '1', '2', '3']
Upvotes: 16
Reputation: 21088
Reverse the string first and then process it.
tmp = tmp[::-1]
Upvotes: 1