pandas apply returning NaN

Question

I have a json which I am converting to a dictionary and then I am creating a dataframe using certain key-value pairs present in the dictionary

# json
a = """{
    "cluster_id": 3,
    "cluster_observation_data": [[1, 2, 3, 4, 5, 6, 7, 8], [2, 3, 4, 5, 6, 7, 8, 1]],
    "cluster_observation_label": [0, 1],
    "cluster_centroid": [1, 2, 3, 4, 5, 6, 7, 10],
    "observation_id":["id_xyz_999","id_abc_000"]
}"""

# convert to dictionary
data = json.loads(a)
sub_dict = dict((k, data[k]) for k in ('cluster_observation_data', 'cluster_observation_label'))
train = pd.DataFrame.from_dict(sub_dict, orient='columns')

After converting it to a ddataframe, I an trying to calculate its euclidean distance from the cluster_centroid present in the data dictionary. The function works fine, but in the final train dataframe I am getting NaNs

def distance_from_center(row):
    centre = data['cluster_centroid']
    obs_data = row[0]
    print('obs_data', obs_data)
    print('



')
    print('center', centre)
    # print(type(obs_data))
    # print(type(centre))
    dist = sum([(a - b)**2 for a, b in zip(centre, obs_data)])
    print(dist)
    return dist

train.loc[:, 'center_dist'] = train.loc[:, ['cluster_observation_data']].apply(distance_from_center)

I'm not able to figure where it is that I am going wrong. even a small hint will do.

mcsim · Accepted Answer

You need to pass axis, like:

train.loc[:, 'center_dist'] = train.loc[:, ['cluster_observation_data']].apply(distance_from_center, 1)

The reason is that you want to apply function to each list inidividualy. Documentation says:

1 or ‘columns’: apply function to each row

pandas apply returning NaN

Answers (2)

Related Questions