Jay
Jay

Reputation: 21

How to read strings as integers when reading from a file in python

I have the following line of code reading in a specific part of a text file. The problem is these are numbers not strings so I want to convert them to ints and read them into a list of some sort.

A sample of the data from the text file is as follows:

However this is not wholly representative I have uploaded the full set of data here: http://s000.tinyupload.com/?file_id=08754130146692169643 as a text file.

*NSET, NSET=Nodes_Pushed_Back_IB

99915527, 99915529, 99915530, 99915532, 99915533, 99915548, 99915549, 99915550, 99915551, 99915552, 99915553, 99915554, 99915555, 99915556, 99915557, 99915558, 99915562, 99915563, 99915564, 99915656, 99915657, 99915658, 99915659, 99915660, 99915661, 99915662, 99915663, 99915664, 99915665, 99915666, 99915667, 99915668, 99915669, 99915670, 99915885, 99915886, 99915887, 99915888, 99915889, 99915890, 99915891, 99915892, 99915893, 99915894, 99915895, 99915896, 99915897, 99915898, 99915899, 99915900, 99916042, 99916043, 99916044, 99916045, 99916046, 99916047, 99916048, 99916049, 99916050

*NSET, NSET=Nodes_Pushed_Back_OB

Any help would be much appreciated.

Hi I am still stuck with this issue any more suggestions? Latest code and error message is as below Thanks!

 import tkinter as tk
 from tkinter import filedialog
 file_path = filedialog.askopenfilename()
 print(file_path)
 data =  []
 data2 = []
 data3 = []
 flag= False
 with open(file_path,'r') as f:
     for line in f:
         if line.strip().startswith('*NSET, NSET=Nodes_Pushed_Back_IB'):
             flag= True
         elif line.strip().endswith('*NSET, NSET=Nodes_Pushed_Back_OB'):
             flag= False    #loop stops when condition is false i.e if false do nothing
         elif flag:          # as long as flag is true append
             data.append([int(x) for x in line.strip().split(',')]) 

 result is the following error:

 ValueError: invalid literal for int() with base 10: ''

Instead of reading these as strings I would like each to be a number in a list, i.e [98932850 98932852 98932853 98932855 98932856 98932871 98932872 98932873]

Upvotes: 1

Views: 227

Answers (3)

gregory
gregory

Reputation: 13033

Using a sample of your string:

strings = '  98932850,  98932852,  98932853,  98932855,  98932856,  98932871,  98932872,  98932873,\n'

I'd just split the string, strip the commas, and return a list of numbers:

numbers = [ int(s.strip(',')) for s in strings.split() ]

Based on your comment and regarding the larger context of your code. I'd suggest a few things:

from itertools import groupby
number_groups = []
with open('data.txt', 'r') as f:
    for k, g in groupby(f, key=lambda x: x.startswith('*NSET')):
        if k:
            pass
        else:
            number_groups += list(filter('\n'.__ne__, list(g)))  #remove newlines in list

data = []
for group in number_groups:
    for str_num in group.strip('\n').split(','):
        data.append(int(str_num))

Upvotes: 1

Yann Vernier
Yann Vernier

Reputation: 15887

Your line contains more than one number, and some separating characters. You could parse that format by judicious application of split and perhaps strip, or you could minimize string handling by having re extract specifically the fields you care about:

ints = list(map(int, re.findall(r'-?\d+', line)))

This regular expression will find each group of digits, optionally prefixed by a minus sign, and then map will apply int to each such group found.

Upvotes: 1

Israel Unterman
Israel Unterman

Reputation: 13520

In such cases I use regular expressions together with string methods. I would solve this problem like so:

import re 
with open(filepath) as f:
    txt = f.read()

g = re.search(r'NSET=Nodes_Pushed_Back_IB(.*)', txt, re.S)
snums = g.group(1).replace(',', ' ').split()
numbers = [int(num) for num in snums]

I read the entire text into txt. Next I use a regular expression and use the last portion of your header in the text as an anchor, and capture with capturing parenthesis all the rest (the re.S flag means that a dot should capture also newlines). I access all the nubers as one unit of text via g.group(1).

Next. I remove all the commas (actually replace them with spaces) because on the resulting text I use split() which is an excellent function to use on text items that are separated with spaces - it doesn't matter the amount of spaces, it just splits it as you would intent.

The rest is just converting the text to numbers using a list comprehension.

Upvotes: 1

Related Questions