Reputation: 391
I am trying to read mzXML files using Pyteomics' mzxml
class. The elements that I need to access are in numpy.ndarray
format, which I convert as lists. The mzXML files contain several columns with lists as values. The main objective is to combine the two lists into 2D array (side by side in column-wise) so that I can save as CSV files.
I tried using np.concatenate((mzplist, mzplist2), axis=1)
, which produced axis=1
error saying that axis=1
is out of bounds for 1D arrays. I also tried using hstack
, column_stack
. The closest I got was from column_stack
(code below) but the resulting array was 1D when I viewed the resulting CSV files (each cell of Excel contains m/z value and intensity value separated by a space).
plist = []
for files in os.listdir(full_path):
filename = os.path.basename(files)
with mzxml.read(full_path + '\\' + filename) as reader:
for line in reader:
mzplist = line['m/z array'].tolist()
mzplist2 = line['intensity array'].tolist()
print(type(mzplist))
mzplist = np.column_stack([mzplist, mzplist2])
#mzplist.columns = ['mass', 'Intensity']
np.savetxt(newfolder + '\\' + filename + '.csv', mzplist)
plist = []
mzplist = []
mzplist2 = []
Expected results for mzplist
:
Mass Intensity
1 2
3 4
5 6
Here line['m/z array'].tolist()
yields a list [1, 3, 5, ...]
, and line['intensity array'].tolist()
yields a list [2, 4, 6, ...]
.
Am I missing something?
Upvotes: 0
Views: 4491
Reputation: 231738
With 2 lists as you describe:
In [39]: alist=[1,3,5,7]; blist=[2,4,6,8]
A natural way to combine them into an array is:
In [40]: arr = np.array((alist, blist))
In [41]: arr
Out[41]:
array([[1, 3, 5, 7],
[2, 4, 6, 8]])
Transpose of that array looks like:
In [42]: arr.T
Out[42]:
array([[1, 2],
[3, 4],
[5, 6],
[7, 8]])
Which we can write with savetxt
as:
In [44]: np.savetxt('foo.txt', arr.T, fmt='%5d')
In [45]: cat foo.txt
1 2
3 4
5 6
7 8
column_stack
and c_
will produce the same array.
You can add a ',' delimiter if that is what your external reader demands.
Do you know how to read the output of a savetxt
write as plain text? I'm using the bash
shell cat
.
When people have problems reading and writing csv
files we usually ask for samples, so we can reproduce the problem. If needed a sample of intermediate arrays (such as the output of the column_stack
) may help. Otherwise we are left guessing as to what the problem is.
Upvotes: 1
Reputation: 36838
each cell of Excel contains m/z value and intensity value separated by a space
I suspect problem source is that line
np.savetxt(newfolder + '\\' + filename + '.csv', mzplist)
as space is default delimiter for np.savetxt
(as documentation say), try to replace that line with
np.savetxt(newfolder + '\\' + filename + '.csv', mzplist, delimiter=',')
and check if that would help.
Upvotes: 2