Reputation: 1466
I have a .dat file that looks like this.
ID_1,5.0,5.0,5.0,...
ID_2,5.0,5.0,5.0,...
I'm trying to import the data into Python as an array.
If I do this, it'll give me a list of tuples.
data = np.genfromtxt('mydat.dat',
dtype=None,
delimiter=',')
However, when I do the following it gives an odd result, probably because that first element is not a float.
np.fromfile('mydat.dat', dtype=float)
array([ 3.45301146e-086, 3.45300781e-086, 3.25195588e-086, ...,
8.04331780e-096, 8.04331780e-096, 1.31544776e-259])
Any suggestions on this? These were the two main ways to import .dat files into Python as an array and they don't seem to provide the desired result.
Upvotes: 5
Views: 57112
Reputation: 11473
Here is one way where we read each line of 'mydat.dat' file , convert each value to str
or float
and then load to numpy
array
.
import numpy as np
def is_float(string):
""" True if given string is float else False"""
try:
return float(string)
except ValueError:
return False
data = []
with open('mydat.dat', 'r') as f:
d = f.readlines()
for i in d:
k = i.rstrip().split(",")
data.append([float(i) if is_float(i) else i for i in k])
data = np.array(data, dtype='O')
Result
>>> data
array([['ID_1', 5.0, 5.0, 5.0],
['ID_2', 5.0, 5.0, 5.0]], dtype=object)
Also, if you can use pandas
to read and manipulate data , I would do so. pandas
works with much efficiency especially for larger data and is easy to manipulate.
#read data as csv to a dataframe
>>> df = pd.read_csv('mydat.dat', sep=",", header=None)
>>> df
0 1 2 3
0 ID_1 5.0 5.0 5.0
1 ID_2 5.0 5.0 5.0
#Transposed data with ID numbers as headers
>>> df.T
0 1
0 ID_1 ID_2
1 5 5
2 5 5
3 5 5
>>>
Upvotes: 5
Reputation: 806
You might want to use numpy loadtext. You can specify formats of different columns.
Upvotes: 3