TheDaJon
TheDaJon

Reputation: 565

Python - Pandas dataframe columns swapping

I have a dataframe that looks like this:

  |    A     |     B     |     C     |     D
0 |  a |  b  |  c  |  d  |  e  |  f  |  g  |  h
1 |  a |  b  |  c  |  d  |  e  |  f  | NaN | NaN
2 |  a |  b  | NaN | NaN |  e  |  f  | NaN | NaN
3 |  a |  b  | NaN | NaN | NaN | NaN | NaN | NaN

and I want to change it to this:

    OBJ   VAL1   VAL2
0    A     a      b
1    A     a      b
2    A     a      b
3    A     a      b
4    A     a      b
5    A     a      b
6    B     c      d
7    B     c      d
8    C     e      f
9    C     e      f
10   C     e      f
11   D     g      h

so the multindex will convert to a column as well.

any help please?

and is there any good tutorial that explains exactly those kind of stuff so I will be able to do it and not just try and error?

Thank you

EDIT: My first originally dataframe looks like this:

      A       B       C       D
0   (a,b)   (c,d)   (e,f)   (g,h)
1   (a,b)   (c,d)   (e,f)    NaN
2   (a,b)    NaN    (e,f)    NaN
3   (a,b)    NaN     NaN     NaN

so in each cell there is a tuple.

Upvotes: 1

Views: 181

Answers (1)

jezrael
jezrael

Reputation: 862611

You can use DataFrame.from_records first, then double reset_index and if necessary sort_values for sorting all columns:

df = pd.DataFrame({"A": [('a','b'),('a','b'),('a','b'),('a','b')], 
                   'B': [('c','d'),('c','d'), np.nan,np.nan], 
                   'C':[('e','f'),('e','f'),('e','f'),np.nan],
                   'D':[('g','h'),np.nan,np.nan,np.nan]})
print (df)
        A       B       C       D
0  (a, b)  (c, d)  (e, f)  (g, h)
1  (a, b)  (c, d)  (e, f)     NaN
2  (a, b)     NaN  (e, f)     NaN
3  (a, b)     NaN     NaN     NaN

stacked = df.stack()
df1 = pd.DataFrame.from_records(stacked.tolist(), index = stacked.index)
        .reset_index(level=0, drop=True)
        .reset_index()
        .sort_values(['index',0,1])
df1.columns = ['OBJ','VAL1','VAL2']
print (df1)
  OBJ VAL1 VAL2
0   A    a    b
4   A    a    b
7   A    a    b
9   A    a    b
1   B    c    d
5   B    c    d
2   C    e    f
6   C    e    f
8   C    e    f
3   D    g    h

If your DataFrame is with MultiIndex in columns, need stack first:

stacked = df.stack()
df1 = pd.DataFrame.from_records(stacked.tolist(), index = stacked.index) \
        .unstack(1) \
        .swaplevel(0, 1, 1) \
        .sort_index(axis=1) \
        .replace({None:np.nan})

print (df1)
   A       B         C         D     
   0  1    0    1    0    1    0    1
0  a  b    c    d    e    f    g    h
1  a  b    c    d    e    f  NaN  NaN
2  a  b  NaN  NaN    e    f  NaN  NaN
3  a  b  NaN  NaN  NaN  NaN  NaN  NaN

df2 = df1.stack(0)
         .reset_index(level=0, drop=True)
         .reset_index()
         .sort_values(['index',0,1])

df2.columns = ['OBJ','VAL1','VAL2']
print (df2)
  OBJ VAL1 VAL2
0   A    a    b
4   A    a    b
7   A    a    b
9   A    a    b
1   B    c    d
5   B    c    d
2   C    e    f
6   C    e    f
8   C    e    f
3   D    g    h

Upvotes: 2

Related Questions