oaklander114
oaklander114

Reputation: 3383

reading file into array only turns out the first 3 columns

I tried to read in my csv file into a numpy array structure and although it seemed to work, I can only find back the first 3 columns of each row.

Below is a sample of my csv file -- I only pasted the first 5 columns because it would be to big. The end of line character is"\r\n":

1,2,3,4,5,..
061110-15-14,061110-15-18,061110-15-22,061210-15-02,061210-15-06,...
0.085539622,-0.518607931,0.072114121,1.763727267,-0.679713944,...
1.058257011,-0.473227862,-0.527200897,0.309381148,-0.473227862,...

This is my code:

import numpy
from numpy import dtype, loadtxt, float64, NaN, isfinite, all

# Open the file.
log_file = open('metab_averaged_zscored.csv')

# 1.Create a dtype from the names in the file header.
header = log_file.readline()
samples = log_file.readline()
log_names = samples.split()

fields = zip(log_names, ['f8']*len(log_names))
fields_dtype = dtype(fields)
logs = numpy.loadtxt(log_file, dtype=fields_dtype, delimiter = ",")

What I get is the following:

 logs = array([(0.085539622, -0.518607931, 0.072114121),
   (1.058257011, -0.473227862, -0.527200897),
   (1.466116577, 0.899374241, -0.466269943),
   (0.402747391, -0.334736177, -0.838561584),
   (0.130944318, 1.047554546, -0.652548242),
   (0.796330151, 1.154931255, -0.329980359),
   (1.236012671, 0.32536557, -0.453508307),
   (0.75888538, 0.120736819, -1.13594891),
   (1.253438842, -0.307437261, -0.801444111),
   (1.486744816, -0.632472495, -0.793814719),
   (1.14192242, 0.167864804, -1.485382644),
   (-0.439353401, -0.190430786, -0.306749765),
   (0.624746908, 0.859866713, 0.046744056),
   (0.867743161, 0.605924104, -0.730731083)], 
  dtype=[('061110-15-14,061110-15-18,061110-15-22,061210-15-02,061210-15-06,061210-15-10',   '<f8'),    .....

But my input file was 49 columns long, where did the rest go?

Upvotes: 0

Views: 34

Answers (2)

vikramls
vikramls

Reputation: 1822

I think the issue is with this line:

log_names = samples.split()

This will split by spaces only but it looks like your columns are also specified with commas. Try this instead:

log_names = samples.split(',')

This will split on commas only.

Upvotes: 0

Warren Weckesser
Warren Weckesser

Reputation: 114781

I think you can simplify things by using genfromtxt instead of loadtxt. Try this one-liner:

data = numpy.genfromtxt('metab_averaged_zscored.csv', delimiter=',', skip_header=1, names=True)

For example,

In [73]: data = numpy.genfromtxt('metab_averaged_zscored.csv.csv', delimiter=',', skip_header=1, names=True)

In [74]: data
Out[74]: 
array([(0.085539622, -0.518607931, 0.072114121, 1.763727267, -0.679713944),
       (1.058257011, -0.473227862, -0.527200897, 0.309381148, -0.473227862)], 
      dtype=[('0611101514', '<f8'), ('0611101518', '<f8'), ('0611101522', '<f8'), ('0612101502', '<f8'), ('0612101506', '<f8')])

(Note that genfromtxt removed the dashes from the field names.)

Upvotes: 1

Related Questions