Reputation: 27329
I have some text file like this, with several 5000 lines:
5.6 4.5 6.8 "6.5" (new line)
5.4 8.3 1.2 "9.3" (new line)
so the last term is a number between double quotes.
What I want to do is, using Python (if possible), to assign the four columns to double variables. But the main problem is the last term, I found no way of removing the double quotes to the number, is it possible in linux?
This is what I tried:
#!/usr/bin/python
import os,sys,re,string,array
name=sys.argv[1]
infile = open(name,"r")
cont = 0
while 1:
line = infile.readline()
if not line: break
l = re.split("\s+",string.strip(line)).replace('\"','')
cont = cont +1
a = l[0]
b = l[1]
c = l[2]
d = l[3]
Upvotes: 24
Views: 89039
Reputation: 101
I think the easiest and most efficient thing to do would be to slice it!
From your code:
d = l[3]
returns "6.5"
so you simply add another statement:
d = d[1:-1]
now it will return 6.5 without the leading and end double quotes.
viola! :)
Upvotes: 1
Reputation: 103
I used in essence to remove the " in "25" using
Code:
result = result.strip("\"") #remove double quotes characters
Upvotes: 5
Reputation: 1
IMHO, the most universal doublequote stripper is this:
In [1]: s = '1 " 1 2" 0 a "3 4 5 " 6'
In [2]: [i[0].strip() for i in csv.reader(s, delimiter=' ') if i != ['', '']]
Out[2]: ['1', '1 2', '0', 'a', '3 4 5', '6']
Upvotes: 0
Reputation: 33974
The csv
module (standard library) does it automatically, although the docs isn't very specific about skipinitialspace
>>> import csv
>>> with open(name, 'rb') as f:
... for row in csv.reader(f, delimiter=' ', skipinitialspace=True):
... print '|'.join(row)
5.6|4.5|6.8|6.5
5.4|8.3|1.2|9.3
Upvotes: 11
Reputation: 319531
for line in open(fname):
line = line.split()
line[-1] = line[-1].strip('"\n')
floats = [float(i) for i in line]
another option is to use built-in module, that is intended for this task. namely csv
:
>>> import csv
>>> for line in csv.reader(open(fname), delimiter=' '):
print([float(i) for i in line])
[5.6, 4.5, 6.8, 6.5]
[5.6, 4.5, 6.8, 6.5]
Upvotes: 9
Reputation: 72748
There's a module you can use from the standard library called shlex
:
>>> import shlex
>>> print shlex.split('5.6 4.5 6.8 "6.5"')
['5.6', '4.5', '6.8', '6.5']
Upvotes: 14
Reputation: 4410
Or you can simply replace your line
l = re.split("\s+",string.strip(line)).replace('\"','')
with this:
l = re.split('[\s"]+',string.strip(line))
Upvotes: 7
Reputation: 7694
You can use regexp, try something like this
import re
re.findall("[0-9.]+", file(name).read())
This will give you a list of all numbers in your file as strings without any quotes.
Upvotes: 0
Reputation: 375484
for line in open(name, "r"):
line = line.replace('"', '').strip()
a, b, c, d = map(float, line.split())
This is kind of bare-bones, and will raise exceptions if (for example) there aren't four values on the line, etc.
Upvotes: 33