vesuvius
vesuvius

Reputation: 435

Change dataframe column values into rows if column names match

I normalized a json object into a dataframe using json_normalize which looks something like this:

ID Name     Email_id      ID Name Email_id     ID Name  Email_id
1   A      [email protected]     2  B    [email protected]  3  C    [email protected]

I wanna convert the column values into rows like this:-

ID   Name   Email_id
1     A      [email protected]
2     B      [email protected]
3     C      [email protected]

but I'm not able to do that. I tried pd.melt() but it gives me Data must be 1-dimensional exception.

Upvotes: 1

Views: 73

Answers (2)

If you do know such data structure is consistent, you can just slice the data by index, then concatenate them:

pd.concat([df.iloc[:, i:i+3] for i in range(0, df.shape[1], 3)])

To make sure to get the column IDs:

import numpy as np

# Get the target indexes
idx = np.arange(df.shape[1])[df.columns=='ID']
idx = np.append(idx, df.shape[1])

# Slice and concatenate data
pd.concat([df.iloc[:, idx[i]:idx[i+1]] for i in range(len(idx)-1)])

Upvotes: 1

jezrael
jezrael

Reputation: 862691

You can select only one column, but because duplicated columns names are selected all columns with same label, then convert to 1d numpy array and pass to DataFrame constructor:

print (df['ID'])
   ID  ID  ID
0   1   2   3

df = pd.DataFrame({'ID': df['ID'].to_numpy().ravel(),
                   'Name': df['Name'].to_numpy().ravel(),
                   'Email_id': df['Email_id'].to_numpy().ravel()})
print (df)
   ID Name     Email_id
0   1    A  [email protected]
1   2    B  [email protected]
2   3    C  [email protected]

Another idea is create MultiIndex in columns by GroupBy.cumcount and reshape by DataFrame.stack:

s = df.columns.to_series()

df.columns = [s, s.groupby(s).cumcount()]
print (df)

  ID Name     Email_id ID Name     Email_id ID Name     Email_id
   0    0            0  1    1            1  2    2            2
0  1    A  [email protected]  2    B  [email protected]  3    C  [email protected]

df = df.stack().reset_index(drop=True)
print (df)
      Email_id  ID Name
0  [email protected]   1    A
1  [email protected]   2    B
2  [email protected]   3    C

Upvotes: 1

Related Questions