Pandas Series.filter.values returning different type than numpy array

Question

I am trying to run the scipy.stats.entropy function on two arrays. It is being run on each row of a Pandas DataFrame via the apply function:

def calculate_H(row):
    pk = np.histogram(row.filter(regex='stuff'), bins=16)[0]
    qk = row.filter(regex='other').values
    stats.entropy(pk, qk, base=2)

df['DKL'] = df.apply(calculate_H, axis=1)

I am getting the following error:

TypeError: ufunc 'xlogy' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

(I've also tried qk = row[row.filter(regex='other').index].values)

I know the issue is with the qk, I can pass another array as qk and it works. The issue is that Pandas is giving me something that says it is a numpy array but it is not quite a numpy array. The following examples all work:

qk1 = np.array([12024, 9643, 7681, 8193, 8012, 7846, 7615, 7484, 5966, 11484, 13627, 17749, 9820, 5336,4611, 3366])
qk2 = Series([12024, 9643, 7681, 8193, 8012, 7846, 7615, 7484, 5966, 11484, 13627, 17749, 9820, 5336,4611, 3366]).values
qk3 = df.filter(regex='other').iloc[0].values

If I check the types, e.g. type(qk) == type(qk1) it gives me True (all numpy.ndarray). Or if I use np.array_equals, also True.

The ONLY hint I have is what happens when I print out the arrays that work vs don't (not working on bottom):

[12024  9643  7681  8193  8012  7846  7615  7484  5966 11484 13627 17749  9820  5336  4611  3366]
[12024 9643 7681 8193 8012 7846 7615 7484 5966 11484 13627 17749 9820 5336 4611 3366]

Notice the one on top has larger spacing in between values.

TLDR; These two expressions return something different

df.filter(regex='other').iloc[0].values
df.iloc[0].filter(regex='other').values

Warren Weckesser · Accepted Answer

I suspect qk is an object array and not an array of integers. In calculate_H, try this:

qk = row.filter(regex='other').values.astype(int)

(i.e. cast the values to an array of integers).

Pandas Series.filter.values returning different type than numpy array

Answers (1)

Related Questions