Reputation: 21
I have a problem filling in values in a column with pandas. I want to add strings which should describe the annual income class of a customer. I want 20% of the length of the data frame to get the value "Lowest", 9% of the data frame should get "Lower Middle" etc... I thought of creating a list and appending the values and then set it as the value for the column but then I get a ValueError Length of values (5) does not match length of index (500)
list_of_lists = []
list_of_lists.append(int(0.2*len(df))*"Lowest")
list_of_lists.append(int(0.09*len(df))*"Lower Middle")
list_of_lists.append(int(0.5*len(df))*"Middle")
list_of_lists.append(int(0.12*len(df))*"Upper Middle")
list_of_lists.append(int(0.12*len(df))*"Highest")
df["Annual Income"] = list_of_lists
Do you have an idea of what could be the best way to do this?
Thanks in advance Best regards Alina
Upvotes: 0
Views: 61
Reputation: 16147
You can use numpy
to do a weighted choice. The method has a list of choices, the number of choices to make, and the probabilities. You could generate this and just do df['Annual Income'] = incomes
I've printed out the value counts so you can see what the totals were. It will be slightly different every time.
Also I had to tweak the probabilities so they add up to 100%
import pandas as pd
from numpy.random import choice
incomes = choice(['Lowest','Lower Middle','Middle','Upper Middle','Highest'], 500,
p=[.2,.09,.49,.11,.11])
df= pd.DataFrame({'Annual Income':incomes})
df.value_counts()
Annual Income
Middle 245
Lowest 87
Upper Middle 66
Highest 57
Lower Middle 45
Upvotes: 1