Reputation: 622
I am using numpy and pandas to attempt to concatenate a number of heterogenous values into a single array.
np.concatenate((tmp, id, freqs))
Here are the exact values:
tmp = np.array([u'DNMT3A', u'p.M880V', u'chr2', 25457249], dtype=object)
freqs = np.array([0.022831050228310501], dtype=object)
id = "id_23728"
The dimensions of tmp
, 17232
, and freqs
are as follows:
[in] tmp.shape
[out] (4,)
[in] np.array(17232).shape
[out] ()
[in] freqs.shape
[out] (1,)
I have also tried casting them all as numpy arrays to no avail.
Although the variable freqs
will frequently have more than one value.
However, with both the np.concatenate
and np.append
functions I get the following error:
*** ValueError: all the input arrays must have same number of dimensions
These all have the same number of columns (0)
, why can't I concatenate them with either of the above described numpy methods?
All I'm looking to obtain is[(tmp), 17232, (freqs)]
in one single dimensional array, which is to be appended onto the end of a pandas dataframe.
Thanks.
Update
It appears I can concatenate the two existing arrays:
np.concatenate([tmp, freqs],axis=0)
array([u'DNMT3A', u'p.M880V', u'chr2', 25457249, 0.022831050228310501], dtype=object)
However, the integer, even when casted cannot be used in concatenate.
np.concatenate([tmp, np.array(17571)],axis=0)
*** ValueError: all the input arrays must have same number of dimensions
What does work, however is nesting append and concatenate
np.concatenate((np.append(tmp, 17571), freqs),)
array([u'DNMT3A', u'p.M880V', u'chr2', 25457249, 17571,
0.022831050228310501], dtype=object)
Although this is kind of messy. Does anyone have a better solution for concatenating a number of heterogeneous arrays?
Upvotes: 3
Views: 8372
Reputation: 22701
The problem is that id
, and later the integer
np.array(17571)
, are not an array_like
object. See here how numpy decides whether an object can be converted automatically to a numpy array or not.
The solution is to make id
array_like
, i.e. to be an element of a list
or tuple
, so that numpy understands that id
belongs to a 1D
array_like
structure
It all boils down to
concatenate((tmp, (id,), freqs))
or
concatenate((tmp, [id], freqs))
To avoid this sort of problems when dealing with input variables in functions using numpy
, you can use atleast_1d
, as pointed out by @askewchan. See about it this question/answer.
Basically, if you are unsure if in different scenarios your variable id
will be a single str
or a list of str
, you are better off using
concatenate((tmp, atleast_1d(id), freqs))
because the two options above will fail if id
is already a list/tuple of strings.
EDIT: It may not be obvious why np.array(17571)
is not an array_like
object. This happens because np.array(17571).shape==()
, so it is not iterable as it has no dimensions.
Upvotes: 2