AlmostAI
AlmostAI

Reputation: 347

KeyError: "None of [Index([...] are in the [columns]

I've got numpy array with shape of (3, 50):

data = np.array([[0, 3, 0, 2, 0, 0, 1, 2, 2, 0, 1, 0, 0, 0, 0, 0, 0, 2, 1, 2, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 2, 1, 0, 0, 0, 0, 0, 1, 0, 0, 7, 0, 0, 0, 0,
        1, 1, 2, 0, 0, 2],
       [0, 0, 0, 0, 0, 3, 0, 1, 6, 1, 1, 0, 0, 0, 0, 2, 0, 0, 1, 0, 1, 0,
        3, 0, 0, 0, 0, 0, 0, 5, 2, 2, 2, 1, 0, 0, 1, 0, 1, 3, 2, 0, 0, 0,
        0, 0, 2, 0, 0, 0],
       [1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 2, 0, 1, 0, 0, 0, 1, 0,
        0, 0, 0, 0, 0, 0]])

and the following column names:

new_cols = [f'description_word_{i+1}_count' for i in range(50)]

I'm trying to add new columns in already existing dataframe in such way:

df[new_cols] = data

but get the error:

KeyError: "None of [Index(['description_word_1_count', 'description_word_2_count',\n 'description_word_3_count', 'description_word_4_count',\n 'description_word_5_count', 'description_word_6_count',\n 'description_word_7_count', 'description_word_8_count',\n 'description_word_9_count', 'description_word_10_count',\n 'description_word_11_count', 'description_word_12_count',\n 'description_word_13_count', 'description_word_14_count',\n 'description_word_15_count', 'description_word_16_count',\n 'description_word_17_count', 'description_word_18_count',\n 'description_word_19_count', 'description_word_20_count',\n 'description_word_21_count', 'description_word_22_count',\n 'description_word_23_count', 'description_word_24_count',\n 'description_word_25_count', 'description_word_26_count',\n 'description_word_27_count', 'description_word_28_count',\n 'description_word_29_count', 'description_word_30_count',\n 'description_word_31_count', 'description_word_32_count',\n 'description_word_33_count', 'description_word_34_count',\n 'description_word_35_count', 'description_word_36_count',\n 'description_word_37_count', 'description_word_38_count',\n 'description_word_39_count', 'description_word_40_count',\n 'description_word_41_count', 'description_word_42_count',\n 'description_word_43_count', 'description_word_44_count',\n 'description_word_45_count', 'description_word_46_count',\n 'description_word_47_count', 'description_word_48_count',\n 'description_word_49_count', 'description_word_50_count'],\n dtype='object')] are in the [columns]"

Also I don't know where it finds a '\n' symbols in my column names.

At the same time creating a new dataframe with the data is OK:

new_df = pd.DataFrame(data=data, columns=new_cols)

Does anyone know what is causing the error?

Upvotes: 2

Views: 5855

Answers (1)

O Pardal
O Pardal

Reputation: 672

Suppose you have a df like this:

df = pd.DataFrame({'person': [1,1,1], 'event': ['A','B','C']})

You can add new columns like this:

import pandas as pd
import numpy as np


data = np.array([[0, 3, 0, 2, 0, 0, 1, 2, 2, 0, 1, 0, 0, 0, 0, 0, 0, 2, 1, 2, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 2, 1, 0, 0, 0, 0, 0, 1, 0, 0, 7, 0, 0, 0, 0,
        1, 1, 2, 0, 0, 2],
       [0, 0, 0, 0, 0, 3, 0, 1, 6, 1, 1, 0, 0, 0, 0, 2, 0, 0, 1, 0, 1, 0,
        3, 0, 0, 0, 0, 0, 0, 5, 2, 2, 2, 1, 0, 0, 1, 0, 1, 3, 2, 0, 0, 0,
        0, 0, 2, 0, 0, 0],
       [1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 2, 0, 1, 0, 0, 0, 1, 0,
        0, 0, 0, 0, 0, 0]])

new_cols = [f'description_word_{i+1}_count' for i in range(50)]

df[new_cols] = pd.DataFrame(data, index=df.index)

I think the problem is that you are using a syntax to create series, when you actually need to create several series. In other words, a dataframe.

Upvotes: 3

Related Questions