Alina
Alina

Reputation: 21

Filling column with values (pandas)

I have a problem filling in values in a column with pandas. I want to add strings which should describe the annual income class of a customer. I want 20% of the length of the data frame to get the value "Lowest", 9% of the data frame should get "Lower Middle" etc... I thought of creating a list and appending the values and then set it as the value for the column but then I get a ValueError Length of values (5) does not match length of index (500)

list_of_lists = []
list_of_lists.append(int(0.2*len(df))*"Lowest")
list_of_lists.append(int(0.09*len(df))*"Lower Middle")
list_of_lists.append(int(0.5*len(df))*"Middle")
list_of_lists.append(int(0.12*len(df))*"Upper Middle")
list_of_lists.append(int(0.12*len(df))*"Highest")
df["Annual Income"] = list_of_lists

Do you have an idea of what could be the best way to do this?

Thanks in advance Best regards Alina

Upvotes: 0

Views: 61

Answers (1)

Chris
Chris

Reputation: 16147

You can use numpy to do a weighted choice. The method has a list of choices, the number of choices to make, and the probabilities. You could generate this and just do df['Annual Income'] = incomes

I've printed out the value counts so you can see what the totals were. It will be slightly different every time.

Also I had to tweak the probabilities so they add up to 100%

import pandas as pd
from numpy.random import choice
incomes = choice(['Lowest','Lower Middle','Middle','Upper Middle','Highest'], 500,
              p=[.2,.09,.49,.11,.11])

df= pd.DataFrame({'Annual Income':incomes})


df.value_counts()

Annual Income
Middle           245
Lowest            87
Upper Middle      66
Highest           57
Lower Middle      45

Upvotes: 1

Related Questions