Reputation: 4200
I have a data frame to pivot to longer:
import pandas as pd
import io
_1 = pd.read_csv(io.StringIO(
"""date; origin; val_one; val_two; aaa; bbb; ccc; ddd; eee; fff
10/11/2009; aaa; 1; 0; 0; 0; 0; 0; 1; 0
10/11/2009; bbb; 0; 1; 1; 0; 0; 0; 0; 1
10/11/2009; ccc; 0; 1; 0; 0; 0; 0; 0; 0
10/11/2009; ddd; 0; 2; 0; 1; 1; 1; 0; 0"""),
sep=";").set_axis(['date', 'origin', 'val_one', 'val_two',
'aaa', 'bbb', 'ccc', 'ddd', 'eee', 'fff'], axis=1)
I now want to create origin-target (rows aaa
:fff
) dyads. Rows where no target exists (i.e. none of the target columns is 1
, e.g. the third row, index 2) should be dropped; rows where more than one target exists (e.g. the second row, index 1, where aaa
is 1
and fff
is one of the target columns is 1
) should be turned into two rows. The expected output is:
_2_targ = pd.read_csv(io.StringIO(
"""date; origin; val_one; val_two; target
10/11/2009; aaa; 1; 0; eee
10/11/2009; bbb; 0; 1; aaa
10/11/2009; bbb; 0; 1; fff
10/11/2009; ddd; 0; 2; bbb
10/11/2009; ddd; 0; 2; ccc
10/11/2009; ddd; 0; 2; ddd"""),
sep=';').set_axis(['date', 'origin', 'val_one', 'val_two', 'target'])
I have tried using pd.melt
to no avail (see below) - this creates all dyads, even the ones I don't want to keep (because no target column is 1
).
_2 = pd.melt(_1,
id_vars=['date', 'origin', 'val_one', 'val_two'],
value_vars=['aaa', 'bbb', 'ccc', 'ddd', 'eee', 'fff'],
var_name='target', value_name='tmp')
What am I missing?
Upvotes: 0
Views: 30
Reputation: 4548
I believe you can melt exactly as you are doing, and then drop rows from your long_df that have a value of 0 for tmp
import pandas as pd
import io
#nice reproducible input!
df1 = pd.read_csv(io.StringIO(
"""date; origin; val_one; val_two; aaa; bbb; ccc; ddd; eee; fff
10/11/2009; aaa; 1; 0; 0; 0; 0; 0; 1; 0
10/11/2009; bbb; 0; 1; 1; 0; 0; 0; 0; 1
10/11/2009; ccc; 0; 1; 0; 0; 0; 0; 0; 0
10/11/2009; ddd; 0; 2; 0; 1; 1; 1; 0; 0"""),
sep=";").set_axis(['date', 'origin', 'val_one', 'val_two',
'aaa', 'bbb', 'ccc', 'ddd', 'eee', 'fff'], axis=1)
#Melt exactly the same as you did
long_df = pd.melt(
df1,
id_vars=['date', 'origin', 'val_one', 'val_two'],
value_vars=['aaa', 'bbb', 'ccc', 'ddd', 'eee', 'fff'],
var_name='target', value_name='tmp'
)
#Filter out rows where tmp is 0, sort to match your output, and drop the tmp column
long_df = long_df[long_df['tmp'].gt(0)].sort_values('origin').drop(columns='tmp').reset_index(drop=True)
long_df
The output is nearly as you show above, but I think your forgetting the bbb:fff
row
date origin val_one val_two target
0 10/11/2009 aaa 1 0 eee
1 10/11/2009 bbb 0 1 aaa
2 10/11/2009 bbb 0 1 fff
3 10/11/2009 ddd 0 2 bbb
4 10/11/2009 ddd 0 2 ccc
5 10/11/2009 ddd 0 2 ddd
I've also renamed your variables but that's obviously not important
Upvotes: 1