Reputation: 95
I have a dataframe that looks like this
Index Variable1 Value1 Variable2 Value2 Cat Dog Cow
1 Cat 7 Sheep 7 0 0 0
2 Sheep 2 Cat 6 0 0 0
3 Cow 3 Dog 2 0 0 0
How can I efficiently populate the Cat, Dog and Cow columns with the values in the Value column when the Variable column is equal to the appropriate column name? So it looks something like this
Index Variable1 Value1 Variable2 Value2 Cat Dog Cow
1 Cat 7 Sheep 7 7 0 0
2 Sheep 2 Cat 6 6 0 0
3 Cow 3 Dog 2 0 2 3
I've made a nested for loop that loops over each "Variable" column and then each row in that column that populates the data of each animal based on the value in that cell. But I'm 100% sure this is bad way of doing this.
Upvotes: 3
Views: 158
Reputation: 863256
Use wide_to_long
for reshape, then filter values by list by DataFrame.query
, reshape by Series.unstack
and last use DataFrame.update
:
df = pd.DataFrame({'Variable1': ['Cat', 'Sheep', 'Cow'],
"Value1":[7, 2, 3],
"Variable2": ['Sheep','Cat','Dog'],
'Value2':[7,6,2],
'Cat':[0,0,0],
'Dog':[0,0,0],
'Cow':[0,0,0]}, index=[1,2,3])
L = ['Cat','Dog','Cow']
#or if possible select last 3 column names
#L = df.columns[-3:]
df1 = (pd.wide_to_long(df.reset_index(), ['Variable','Value'],i='index', j='tmp')
.reset_index(level=1, drop=True)
.query("Variable in @L")
.set_index('Variable', append=True)['Value']
.unstack(fill_value=0))
print (df1)
Variable Cat Cow Dog
index
1 7 0 0
2 6 0 0
3 0 3 2
df.update(df1)
print (df)
Variable1 Value1 Variable2 Value2 Cat Dog Cow
1 Cat 7 Sheep 7 7 0 0
2 Sheep 2 Cat 6 6 0 0
3 Cow 3 Dog 2 0 2 3
Upvotes: 0
Reputation: 28709
#create variables
vals = df.filter(like = 'Value').columns
variables = df.filter(like = 'Variable').columns
animals = df.iloc[:,-3:].columns
#lump all 'Variable_' and 'Value_' into one df
res = pd.concat(df.filter(ent).set_axis(['val','var'],axis=1) for ent in zip(vals,variables))
res
val var
0 7 Cat
1 2 Sheep
2 3 Cow
0 7 Sheep
1 6 Cat
2 2 Dog
#pivot res
out = (res
.pivot(columns='var',values='val')
.fillna(0)
.astype(int)
.filter(animals)
)
out
var Cat Dog Cow
0 7 0 0
1 6 0 0
2 0 2 3
#final result
result = pd.concat([df.iloc[:,:-3],out],axis=1)
result
Index Variable1 Value1 Variable2 Value2 Cat Dog Cow
0 1 Cat 7 Sheep 7 7 0 0
1 2 Sheep 2 Cat 6 6 0 0
2 3 Cow 3 Dog 2 0 2 3
Upvotes: 1
Reputation: 1441
A good solution would be vectorized operations, which are generally faster than loops. NumPy comes into its own here with np.where:
import pandas as pd
import numpy as np
df = pd.DataFrame.from_dict({'V1': ['Cat', 'Sheep', 'Cow'],
"Va1":[7, 2, 3], "v2": ['Sheep','Cat','Dog'], 'va2':[7,6,2]})
df['Cat'] = np.where(df['V1'] == 'Cat', df['Va1'], np.where(df['v2'] == 'Cat', df['va2'], 0))
df['Dog'] = np.where(df['V1'] == 'Dog', df['Va1'], np.where(df['v2'] == 'Dog', df['va2'], 0))
df['Cow'] = np.where(df['V1'] == 'Cow', df['Va1'], np.where(df['v2'] == 'Cow', df['va2'], 0))
Basically it checks Variable1 for the animal in question, and on matching fills with Value1, else it does the same check for Variable and Value 2, otherwise filles with 0.
Upvotes: 0