Gokturk Demir
Gokturk Demir

Reputation: 21

Creating a new column into a dataframe based on conditions

For the dataframe df :

dummy_data1 = {'category': ['White', 'Black', 'Hispanic','White'],
           'Pop':['75','85','90','100'],'White_ratio':[0.6,0.4,0.7,0.35],'Black_ratio':[0.3,0.2,0.1,0.45], 'Hispanic_ratio':[0.1,0.4,0.2,0.20]    }
df = pd.DataFrame(dummy_data1, columns = ['category', 'Pop','White_ratio', 'Black_ratio', 'Hispanic_ratio'])

I want to add a new column to this data frame,'pop_n', by first checking the category, and then multiplying the value in 'Pop' by the corresponding ratio value in the columns. For the first row, the category is 'White' so it should multiply 75 with 0.60 and put 45 in pop_n column. I thought about writing something like :

df['pop_n']= (df['Pop']*df['White_ratio']).where(df['category']=='W')

this works but just for one category. I will appreciate any helps with this.

Thanks.

Upvotes: 0

Views: 77

Answers (2)

DYZ
DYZ

Reputation: 57105

Locate the columns that have underscores in their names:

to_rename = {x: x.split("_")[0] for x in df if "_" in x}

Find the matching factors:

stack = df.rename(columns=to_rename)\
          .set_index('category').stack()
factors = stack[map(lambda x: x[0]==x[1], stack.index)]\
          .reset_index(drop=True)

Multiply the original data by the factors:

df['pop_n'] = df['Pop'].astype(int) * factors

#   category  Pop  White_ratio  Black_ratio  Hispanic_ratio pop_n
#0     White   75         0.60         0.30             0.1    45
#1     Black   85         0.40         0.20             0.4    17
#2  Hispanic   90         0.70         0.10             0.2    18
#3     White  100         0.35         0.45             0.2    35

Upvotes: 0

Erfan
Erfan

Reputation: 42946

Using DataFrame.filter and DataFrame.lookup:

First we use filter to get the columns with ratio in the name. Then split and keep the first word before the underscore only.

Finally we use lookup to match the category values to these columns.

# df['Pop'] = df['Pop'].astype(int)
df2 = df.filter(like='ratio').rename(columns=lambda x: x.split('_')[0])
df['pop_n'] = df2.lookup(df.index, df['category']) * df['Pop']
   category  Pop  White_ratio  Black_ratio  Hispanic_ratio  pop_n
0     White   75         0.60         0.30             0.1   45.0
1     Black   85         0.40         0.20             0.4   17.0
2  Hispanic   90         0.70         0.10             0.2   18.0
3     White  100         0.35         0.45             0.2   35.0

Upvotes: 2

Related Questions