Vikalp Jain
Vikalp Jain

Reputation: 603

Reading two columns of numbers from a text file in python

I have a text file which looks like this (pasting just first few rows):

x   y
4   4
2   5
8   5
8   5
4   5
6   7

I need to read this file and plot a graph of x versus y. This is how my code looks like:

import numpy as np
import matplotlib.pyplot as plt

with open("C:\Vikalp\Learning\Machine Learning\Practice\carstopping.txt") as f:
    next(f)
    data = f.read()

data = data.split('\n')

x = [(row.split('\t')[0]).strip() for row in data]
print(x)

y = [row.split('\t')[1] for row in data]

My print(x) statement is printing lot of ascii stuff:

['\x00', '\x004\x00', '\x00', '\x002\x00', '\x00', '\x008\x00', '\x00', '\x008\x00', '\x00', '\x004\x00', '\x00', '\x006\x00', '\x00', '\x007\x00', '\x00', '\x009\x00', '\x00', '\x008\x00', '\x00', '\x001\x003\x00', '\x00', '\x001\x001\x00', '\x00', '\x005\x00', '\x00', '\x005\x00', '\x00', '\x001\x003\x00', '\x00', '\x008\x00', '\x00', '\x001\x007\x00', '\x00', '\x001\x004\x00', '\x00', '\x001\x001\x00', '\x00', '\x002\x001\x00', '\x00', '\x001\x009\x00', '\x00', '\x001\x008\x00', '\x00', '\x002\x007\x00', '\x00', '\x001\x005\x00', '\x00', '\x001\x004\x00', '\x00', '\x001\x006\x00', '\x00', '\x001\x006\x00', '\x00', '\x001\x009\x00', '\x00', '\x001\x004\x00', '\x00', '\x003\x004\x00', '\x00', '\x002\x009\x00', '\x00', '\x002\x002\x00', '\x00', '\x004\x007\x00', '\x00', '\x002\x009\x00', '\x00', '\x003\x004\x00', '\x00', '\x003\x000\x00', '\x00', '\x004\x008\x00', '\x00', '\x005\x005\x00', '\x00', '\x003\x009\x00', '\x00', '\x004\x002\x00', '\x00', '\x003\x005\x00', '\x00', '\x005\x006\x00', '\x00', '\x003\x003\x00', '\x00', '\x005\x009\x00', '\x00', '\x004\x008\x00', '\x00', '\x005\x006\x00', '\x00', '\x003\x009\x00', '\x00', '\x004\x001\x00', '\x00', '\x007\x008\x00', '\x00', '\x005\x007\x00', '\x00', '\x006\x004\x00', '\x00', '\x008\x004\x00', '\x00', '\x006\x008\x00', '\x00', '\x005\x004\x00', '\x00', '\x006\x000\x00', '\x00', '\x001\x000\x001\x00', '\x00', '\x006\x007\x00', '\x00', '\x007\x007\x00', '\x00', '\x008\x005\x00', '\x00', '\x001\x000\x007\x00', '\x00', '\x007\x009\x00', '\x00', '\x001\x003\x008\x00', '\x00', '\x001\x001\x000\x00', '\x00', '\x001\x003\x004\x00', '\x00', '\x00']

How do I get rid of all these special characters?

Edit

Based on the suggestion, I modified my code to following:

import numpy as np
import matplotlib.pyplot as plt

file_data = np.genfromtxt("C:\Vikalp\Learning\Machine Learning\Practice\carstopping.txt", usecols=(0,1), skip_header=1, dtype=str)
print(file_data)
x = file_data[:,0]
print(x)

y = file_data[:,1]
print(y)

This is what I get in console:

[['\x004' '\x004']
 ['\x002' '\x005']
 ['\x008' '\x005']
 ..., 
 ['\x001\x003\x008' '\x003\x009']
 ['\x001\x001\x000' '\x004\x000']
 ['\x001\x003\x004' '\x004\x000']]
['\x004' '\x002' '\x008' ..., '\x001\x003\x008' '\x001\x001\x000'
 '\x001\x003\x004']
['\x004' '\x005' '\x005' ..., '\x003\x009' '\x004\x000' '\x004\x000']

Not sure why I am getting all these characters. To get rid of them I included following line:

x = str(x).replace('\\x00','')
y = str(y).replace('\\x00','')

With this I get below output in console:

[['\x004' '\x004']
 ['\x002' '\x005']
 ['\x008' '\x005']
 ..., 
 ['\x001\x003\x008' '\x003\x009']
 ['\x001\x001\x000' '\x004\x000']
 ['\x001\x003\x004' '\x004\x000']]
['4' '2' '8' ..., '138' '110'
 '134']
['4' '5' '5' ..., '39' '40' '40']

So, x and y are now list of strings. Not sure how to convert them to integers. Tried following:

x = list(map(int,x))

Gives this error:

  File "C:\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 880, in runfile
    execfile(filename, namespace)

  File "C:\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/Vikalp/Learning/Machine Learning/Practice/lr_practice_1.py", line 28, in <module>
    x = list(map(int,x))

ValueError: invalid literal for int() with base 10: '['

I have these three issues:

  1. How to deal with special characters like /x00 and why are they appearing. Text file seems clean.
  2. How to convert list of string to list of int
  3. What's the best way to write this code?

Upvotes: 3

Views: 11534

Answers (3)

Vibhutha Kumarage
Vibhutha Kumarage

Reputation: 1399

Your file is a UTF-16-LE file. So you need to add encoding argument.

import numpy as np
import matplotlib.pyplot as plt
import codecs

filecp = codecs.open('carstopping.txt', encoding ='utf-16-le')
file_data = np.loadtxt(filecp, usecols=(0,1),skiprows=1)
print(file_data)
x = file_data[:,0]
print(x)

y = file_data[:,1]
print(y)

Upvotes: 2

mastakenn
mastakenn

Reputation: 1

I would use numpy.loadtxt as suggested. However, I think you need to set the delimiter argument.

e.g.

import numpy as np
array_txt = np.loadtxt(
    "C:\Vikalp\Learning\Machine Learning\Practice\carstopping.txt",
    usecols=(0, 1), delimiter=','
)

By default the delimiter is whitespace.

Upvotes: 0

Vibhutha Kumarage
Vibhutha Kumarage

Reputation: 1399

You can use numpy.loadtext for this task

>>> import numpy as np
>>> array_txt = np.loadtxt("C:\Vikalp\Learning\Machine Learning\Practice\carstopping.txt",usecols=(0, 1), skiprows=1)
>>> array_txt
array([[ 4.,  4.],
       [ 2.,  5.],
       [ 8.,  5.],
       [ 8.,  5.],
       [ 4.,  5.],
       [ 6.,  7.]])
>>> x = array_txt[:,0]
>>> x
array([ 4.,  2.,  8.,  8.,  4.,  6.])
>>> y = array_txt[:,1]
>>> y
array([ 4.,  5.,  5.,  5.,  5.,  7.])

Upvotes: 2

Related Questions