numpy arrays dimension mismatch

Question

I am using numpy and pandas to attempt to concatenate a number of heterogenous values into a single array.

np.concatenate((tmp, id, freqs))

Here are the exact values:

tmp = np.array([u'DNMT3A', u'p.M880V', u'chr2', 25457249], dtype=object)
freqs = np.array([0.022831050228310501], dtype=object)
id = "id_23728"

The dimensions of tmp, 17232, and freqs are as follows:

[in]  tmp.shape
[out] (4,)
[in]  np.array(17232).shape
[out] ()
[in]  freqs.shape
[out] (1,)

I have also tried casting them all as numpy arrays to no avail.

Although the variable freqs will frequently have more than one value.

However, with both the np.concatenate and np.append functions I get the following error:

*** ValueError: all the input arrays must have same number of dimensions

These all have the same number of columns (0), why can't I concatenate them with either of the above described numpy methods?

All I'm looking to obtain is[(tmp), 17232, (freqs)] in one single dimensional array, which is to be appended onto the end of a pandas dataframe.

Thanks.

Update

It appears I can concatenate the two existing arrays:

np.concatenate([tmp, freqs],axis=0)
array([u'DNMT3A', u'p.M880V', u'chr2', 25457249, 0.022831050228310501], dtype=object)

However, the integer, even when casted cannot be used in concatenate.

np.concatenate([tmp, np.array(17571)],axis=0)
*** ValueError: all the input arrays must have same number of dimensions

What does work, however is nesting append and concatenate

np.concatenate((np.append(tmp, 17571), freqs),)
array([u'DNMT3A', u'p.M880V', u'chr2', 25457249, 17571,
       0.022831050228310501], dtype=object)

Although this is kind of messy. Does anyone have a better solution for concatenating a number of heterogeneous arrays?

gg349 · Accepted Answer

The problem is that id, and later the integer np.array(17571), are not an array_like object. See here how numpy decides whether an object can be converted automatically to a numpy array or not.

The solution is to make id array_like, i.e. to be an element of a list or tuple, so that numpy understands that id belongs to a 1D array_like structure

It all boils down to

concatenate((tmp, (id,), freqs))

or

concatenate((tmp, [id], freqs))

To avoid this sort of problems when dealing with input variables in functions using numpy, you can use atleast_1d, as pointed out by @askewchan. See about it this question/answer.

Basically, if you are unsure if in different scenarios your variable id will be a single str or a list of str, you are better off using

concatenate((tmp, atleast_1d(id), freqs))

because the two options above will fail if id is already a list/tuple of strings.

EDIT: It may not be obvious why np.array(17571) is not an array_like object. This happens because np.array(17571).shape==(), so it is not iterable as it has no dimensions.

numpy arrays dimension mismatch

Answers (1)

Related Questions