Mazz
Mazz

Reputation: 820

Reshape a dataframe for all values in a column

I have a dataframe like so:

number   type     X     Y           Z              
  1       red     101   NaN         101,NaN          
  2       blue    101   40,50       101,40,50
  3       green   401   70,NaN      101,70,NaN

Is there any way that I can flag each type element based on the column Z? Z is just a combined version of X and Y.

So something like:

 Number    Type         Z     X_or_Y
  1        red        101     X
  1        red        NaN     Y
  2        blue       101     X
  2        blue       40      Y
  2        blue       50      Y
  3        green      101     X
  3        green      70      Y
  3        green      NaN     Y

I'm thinking of the melt function but not sure how.

reshaped_data = pd.melt(df, id_vars =['number', 'type',
                           'Z'], value_vars =['X', 'Y'])

**Edit: ** The values in the X_or_Y column should come from the X ans Y columns in the initial dataframe

Upvotes: 1

Views: 124

Answers (1)

jezrael
jezrael

Reputation: 862591

Solution for pandas under 0.24+ with DataFrame.melt first, then Series.str.split and DataFrame.stack for explode splitted values by ,:

df1 = df.melt(id_vars =['number', 'type'], value_vars =['X', 'Y'], var_name='Flag')

s = (df1.pop('value').astype(str).str.split(',', expand=True)
       .stack()
       .reset_index(level=1, drop=True)
       .rename('Z'))
df1 = df1.join(s).sort_values('number').reset_index(drop=True)

print (df1)
   number   type Flag    Z
0       1    red    X  101
1       1    red    Y  nan
2       2   blue    X  101
3       2   blue    Y   40
4       2   blue    Y   50
5       3  green    X  401
6       3  green    Y   70
7       3  green    Y  NaN

Solution above pandas 0.24 with DataFrame.explode:

df1 = df.melt(id_vars =['number', 'type'], value_vars =['X', 'Y'], var_name='Flag')

df1 = (df1.assign(value = df1['value'].astype(str).str.split(','))
         .explode('value')
         .sort_values('number')
         .reset_index(drop=True))

print (df1)
   number   type Flag value
0       1    red    X   101
1       1    red    Y   nan
2       2   blue    X   101
3       2   blue    Y    40
4       2   blue    Y    50
5       3  green    X   401
6       3  green    Y    70
7       3  green    Y   NaN

Upvotes: 2

Related Questions