Reputation: 2533
I have the following (toy) data set:
import pandas as pd
import numpy as np
df = pd.DataFrame({'Manufacturer':['Allen Edmonds', 'Louis Vuitton 23', 'Louis Vuitton 8', 'Gulfstream', 'Bombardier', '23 - Louis Vuitton', 'Louis Vuitton 20'],
'System':['None', 'None', '14 Platinum', 'Gold', 'None', 'Platinum 905', 'None']
})
Next, I create a column named Manufacturer
based on the two existing columns:
df.loc[(df['Manufacturer'].str.contains('Louis')) &
(df['System'].str.contains('Platinum')),
'Pricing'] = 'East Coast'
On the toy data set, this approach works as expected. However, on the production data (which, unfortunately, I cannot share), I see the following error message:
KeyError: "None of [Float64Index([nan, nan, nan, nan, nan, nan...], \n dtype='float64', length=583)] are in the [index]"
At first, I thought that the error might be caused by whitespace in the column headers. But, it doesn't look like this is the case.
The column headers are assigned as follows:
for elem in elements:
d = {
'Manufacturer' : issue.fields.manufacturer,
'System' : issue.fields.system
}
(the data comes from a database)
Any idea what might be causing this Key Error?
Maybe I need to use an adaptation of:
df['Pricing'] = np.where(df['Manufacturer'].str.contains('Louis'), 'East Coast', 'None')
But, I'm not sure how to use np.where
with two conditions... (see How to create a column in a Pandas dataframe based on a conditional substring search of one or more OTHER columns for my original question).
Thanks in advance!
Upvotes: 1
Views: 195
Reputation: 1094
it's hard to understand error without the data.
you can try np.where on 2 conditions this way:
df['Pricing']=np.where((df['Manufacturer'].str.contains('Louis') & df['System'].str.contains('Platinum')), 'East Coast', None)
Upvotes: 1