J R
J R

Reputation: 181

Python - Convert negative decimals from string to float

I need to read in a large number of .txt files, each of which contains a decimal (some are positive, some are negative), and append these into 2 arrays (genotypes and phenotypes). Subsequently, I wish to perform some mathematical operations on these arrays in scipy, however the negative ('-') symbol is causing problems. Specifically, I cannot convert the arrays to float, because the '-' is being read as a string, causing the following error:

ValueError: could not convert string to float:

Here is my code as it's currently written:

import linecache

gene_array=[]
phen_array=[]

for i in genotype:

   for j in phenotype:

      genotype='/path/g.txt'
      phenotype='/path/p.txt'

      g=linecache.getline(genotype,1)
      p=linecache.getline(phenotype,1)

      p=p.strip()
      g=g.strip()

      gene_array.append(g)
      phen_array.append(p)

  gene_array=map(float,gene_array)
  phen_array=map(float,phen_array)

I am fairly certain at this point that it is the negative sign that is causing the problem, but it is not clear to me why. Is my use of Linecache the problem here? Is there an alternative method that would be better?

The result of

print gene_array

is

['-0.0448022516321286', '-0.0236187263814157', '-0.150505384829925', '-0.00338459268479522', '0.0142429109897682', '0.0286253352284279', '-0.0462358095345649', '0.0286232317578776', '-0.00747425206137217', '0.0231790239373428', '-0.00266935581919541', '0.00825077426011094', '0.0272744527203547', '0.0394829854063242', '0.0233109171715023', '0.165841084392078', '0.00259693465334536', '-0.0342590874424289', '0.0124600520095644', '0.0713627590092807', '-0.0189374898081401', '-0.00112750710611284', '-0.0161387333242288', '0.0227226505624106', '0.0382173405035751', '0.0455518646388402', '-0.0453048799717046', '0.0168570746329513']

Upvotes: 1

Views: 10507

Answers (3)

NPE
NPE

Reputation: 500963

There is nothing in the error message to suggest that - is the problem. The most likely reason is that gene_array and/or phen_array contain an empty string ('').

As stated in the documentation, linecache.getline()

will return '' on errors (the terminating newline character will be included for lines that are found).

Upvotes: 0

Abhijit
Abhijit

Reputation: 63787

The issue seems to be with empty string or space as evident from your error message

ValueError: could not convert string to float:

To make it work, convert the map to a list comprehension

gene_array=[float(e) for e in gene_array if e]
phen_array=[float(e) for e in phen_array if e]

By empty string means

float(" ") or float("") would give value errors, so if any of the items within gene_array or phen_array has space, this will throw an error while converting to float

There could be many reasons for empty string like

  • empty or blank line
  • blank line either at the beginning or end

Upvotes: 3

Konstantin Dinev
Konstantin Dinev

Reputation: 34915

The issue is definitely not in the negative sign. Python converts strings with negative sign without a problem. I suggest you run each of your entries against a float RegEx and see if they all pass.

Upvotes: 0

Related Questions