Bong Kyo Seo
Bong Kyo Seo

Reputation: 391

How to combine two 1D lists into 2D array?

I am trying to read mzXML files using Pyteomics' mzxml class. The elements that I need to access are in numpy.ndarray format, which I convert as lists. The mzXML files contain several columns with lists as values. The main objective is to combine the two lists into 2D array (side by side in column-wise) so that I can save as CSV files.

I tried using np.concatenate((mzplist, mzplist2), axis=1), which produced axis=1 error saying that axis=1 is out of bounds for 1D arrays. I also tried using hstack, column_stack. The closest I got was from column_stack (code below) but the resulting array was 1D when I viewed the resulting CSV files (each cell of Excel contains m/z value and intensity value separated by a space).

plist = []

for files in os.listdir(full_path):
    filename = os.path.basename(files)
    with mzxml.read(full_path + '\\' + filename) as reader:
        for line in reader:
            mzplist = line['m/z array'].tolist()
            mzplist2 = line['intensity array'].tolist()
            print(type(mzplist))
            mzplist = np.column_stack([mzplist, mzplist2])
            #mzplist.columns = ['mass', 'Intensity']
            np.savetxt(newfolder + '\\' + filename + '.csv', mzplist) 
            plist = []
            mzplist = []
            mzplist2 = []

Expected results for mzplist:

 Mass       Intensity
  1            2
  3            4
  5            6

Here line['m/z array'].tolist() yields a list [1, 3, 5, ...], and line['intensity array'].tolist() yields a list [2, 4, 6, ...].

Am I missing something?

Upvotes: 0

Views: 4491

Answers (2)

hpaulj
hpaulj

Reputation: 231738

With 2 lists as you describe:

In [39]: alist=[1,3,5,7]; blist=[2,4,6,8]

A natural way to combine them into an array is:

In [40]: arr = np.array((alist, blist))
In [41]: arr
Out[41]: 
array([[1, 3, 5, 7],
       [2, 4, 6, 8]])

Transpose of that array looks like:

In [42]: arr.T
Out[42]: 
array([[1, 2],
       [3, 4],
       [5, 6],
       [7, 8]])

Which we can write with savetxt as:

In [44]: np.savetxt('foo.txt', arr.T, fmt='%5d')
In [45]: cat foo.txt
    1     2
    3     4
    5     6
    7     8

column_stack and c_ will produce the same array.

You can add a ',' delimiter if that is what your external reader demands.

Do you know how to read the output of a savetxt write as plain text? I'm using the bash shell cat.

When people have problems reading and writing csv files we usually ask for samples, so we can reproduce the problem. If needed a sample of intermediate arrays (such as the output of the column_stack) may help. Otherwise we are left guessing as to what the problem is.

Upvotes: 1

Daweo
Daweo

Reputation: 36838

each cell of Excel contains m/z value and intensity value separated by a space

I suspect problem source is that line

np.savetxt(newfolder + '\\' + filename + '.csv', mzplist)

as space is default delimiter for np.savetxt (as documentation say), try to replace that line with

np.savetxt(newfolder + '\\' + filename + '.csv', mzplist, delimiter=',')

and check if that would help.

Upvotes: 2

Related Questions