Reputation: 1282
I have seen quite some questions on this but I still can't put them together for this particular problem.
I have a df like so;
idx value name1 %1 name2 %2 name3 %3
0 100 person1 0.3 person2 0.5 person3 0.2
1 100 person4 1.0 None NaN None None
2 100 person1 0.6 person5 0.4 None None
Generated like so: pd.DataFrame(columns= ['value','name1','%1','name2','%2','name3','%3'],data=[[100,'person1',0.3,'person2',0.5,'person3','0.2'],[100,'person4',1], [100,'person1',0.6,'person5',0.4]])
I would like to split out each row with multiple names into their own rows like so:
idx value name
0 30 person1
1 50 person2
2 20 person3
3 100 person4
4 60 person1
5 40 person5
Need to take a percentage of the initial value for each unique person and make a new row for them. Eg. person1 row 0 - 100 * 0.3 (%1 value).
Hope that is clear. Any help is much appreciated.
Upvotes: 1
Views: 150
Reputation: 29
First let's add the information we need:
a['value * %1'] = a['value'] * a['%1']
a['value * %2'] = a['value'] * a['%2']
a['value * %3'] = a['value'] * a['%3']
The result of which is:
value name1 %1 name2 %2 name3 %3 value * %1 value * %2 value * %3
0 100 person1 0.3 person2 0.5 person3 0.2 30.0 50.0 20.0
1 100 person4 1.0 None NaN None NaN 100.0 NaN NaN
2 100 person1 0.6 person5 0.4 None NaN 60.0 40.0 NaN
Now we just need to create a new DataFrame and throw in the values there:
df = pd.DataFrame()
df['value'] = a['value * %1'].tolist() + a['value * %2'].tolist() + a['value * %3'].tolist()
df['name'] = a['name1'].tolist() + a['name2'].tolist() + a['name3'].tolist()
And drop the Nan values in the end:
df.dropna()
There is probably a better way. But that's what I've come up with
Upvotes: 0
Reputation: 12201
Here's a multi-stage solution, with annotations in the comments:
import pandas as pd
df = pd.DataFrame(columns=['value', 'name1', '%1', 'name2', '%2', 'name3', '%3'],
data=[[100, 'person1', 0.3, 'person2', 0.5, 'person3', '0.2'],
[100, 'person4', 1], [100, 'person1', 0.6, 'person5', 0.4]])
# Move the name columns below each other in rows
df1 = pd.melt(df, id_vars=['value'], value_vars=['name1', 'name2', 'name3'],
value_name='name')
# Move the percentage columns below each other in rows
df2 = pd.melt(df, id_vars=['value'], value_vars=['%1', '%2', '%3'],
value_name='percentage')
# Some input of percentages was string (note '0.2' in the question);
# let's make it's all float
df2['percentage'] = df2['percentage'].astype(float)
# NaNs are equivalent to zero in this case; easier to calculate with 0.0
df2 = df2.fillna(0)
# We can safely concatenate the two frames, under the assumption that in df1,
# the various name and percentage columns match
df3 = pd.concat([df1, df2], axis=1)
# Remove duplicated columns from the concatenation ('value')
df3 = df3.loc[:, ~df3.columns.duplicated()]
# Calculate the actual procentual values
df3.loc[:, 'value'] = df3['value'] * df3['percentage']
# dropna() will remove any row with a NaN/None anywhere. Since we've already
# replaced the percentages with 0.0, this will drop rows that have a
# 'name' of None
df4 = df3.dropna()
# Select the two relevant columns
df4 = df4[['value', 'name']]
print(df4)
value name
0 30.0 person1
1 100.0 person4
2 60.0 person1
3 50.0 person2
5 40.0 person5
6 20.0 person3
Upvotes: 1
Reputation: 494
you can try this:
df
value name1 %1 name2 %2 name3 %3
0 100 person1 0.3 person2 0.5 person3 0.2
1 100 person4 1.0 None NaN None NaN
2 100 person1 0.6 person5 0.4 None NaN
def get_value(sr):
dict_={}
for i in range(1,4):
if sr['name'+str(i)] is None:
continue
dict_[sr['name'+str(i)]] = sr['value']*sr['%'+str(i)]
return pd.Series(dict_)
df_new = df.apply(lambda x : get_value(x), axis=1).stack().reset_index()
df_new
level_0 level_1 0
0 0 person1 30.0
1 0 person2 50.0
2 0 person3 20.0
3 1 person4 100.0
4 2 person1 60.0
5 2 person5 40.0
Upvotes: 0