Prince Francis
Prince Francis

Reputation: 3097

Python dataframe : converting columns into rows

I have a dataframe as follows

d = {'Movie' : ['The Shawshank Redemption', 'The Godfather'],
        'FirstName1': ['Tim', 'Marlon'],
        'FirstName2': ['Morgan', 'Al'],
        'LastName1': ['Robbins', 'Brando'],
        'LastName2': ['Freeman', 'Pacino'],
        'ID1': ['TM', 'MB'],
        'ID2': ['MF', 'AP']
        }
df = pd.DataFrame(d)
df

enter image description here

I would like to re-arrange it into a 4 column dataframe, by converting Firstname1, LastName1, FirstName2, LastName2, ID1, ID2 into 3 column rows as FirstName, LastName, ID then column movie repeats as follows. enter image description here

In sql we do it as follows

select Movie as Movie, FirstName1 as FirstName, LastName1 as LastName, ID1 as ID from table
union
select Movie as Movie, FirstName2 as FirstName, LastName2 as LastName, ID2 as ID from table

Can we achieve it using pandas ?

Upvotes: 3

Views: 283

Answers (4)

Rajat Jain
Rajat Jain

Reputation: 2022

Try below:

d1 = df.filter(regex="1$|Movie").rename(columns=lambda x: x[:-1])
d2 = df.filter(regex="2$|Movie").rename(columns=lambda x: x[:-1])
pd.concat([d1, d2]).rename({'Movi':'Movie'})

Upvotes: 0

jezrael
jezrael

Reputation: 862511

If possible number in column names more like 9 use Series.str.extract for get integers with values before to MultiIndex to columns, so possible DataFrame.stack:

df = df.set_index('Movie')
df1 = df.columns.to_series().str.extract('([a-zA-Z]+)(\d+)')
df.columns = pd.MultiIndex.from_arrays([df1[0], df1[1].astype(int)])

df = df.rename_axis((None, None), axis=1).stack().reset_index(level=1, drop=True).reset_index()
print (df)
                      Movie FirstName  ID LastName
0  The Shawshank Redemption       Tim  TM  Robbins
1  The Shawshank Redemption    Morgan  MF  Freeman
2             The Godfather    Marlon  MB   Brando
3             The Godfather        Al  AP   Pacino

If not use indexing for get last values of columns names with all previous and pass to MultiIndex.from_arrays:

df = df.set_index('Movie')
df.columns = pd.MultiIndex.from_arrays([df.columns.str[:-1], df.columns.str[-1].astype(int)])
df = df.stack().reset_index(level=1, drop=True).reset_index()
print (df)
                      Movie FirstName  ID LastName
0  The Shawshank Redemption       Tim  TM  Robbins
1  The Shawshank Redemption    Morgan  MF  Freeman
2             The Godfather    Marlon  MB   Brando
3             The Godfather        Al  AP   Pacino

Upvotes: 3

Jondiedoop
Jondiedoop

Reputation: 3353

df = df.set_index('Movie')
df.columns = pd.MultiIndex.from_tuples([(col[:-1], col[-1:]) for col in df.columns])

df.stack()

#                           FirstName  ID LastName
#Movie                                            
#The Shawshank Redemption 1       Tim  TM  Robbins
#                         2    Morgan  MF  Freeman
#The Godfather            1    Marlon  MB   Brando
#                         2        Al  AP   Pacino

Use the power of MultiIndex! With from_tuples you create a DataFrame that has one column for FirstNames, divided in FirstName1 and FirstName2 (see below) and similar for ID and LastName. With stack you convert it into rows for each. Before you do this, make Movie the Index to exclude it from what you are doing. You could use reset_index() to regain everything as columns, but I'm not sure if you want that.


Before stack:

#                         FirstName         LastName           ID    
#                                 1       2        1        2   1   2
#Movie                                                               
#The Shawshank Redemption       Tim  Morgan  Robbins  Freeman  TM  MF
#The Godfather               Marlon      Al   Brando   Pacino  MB  AP

Upvotes: 2

Moschte
Moschte

Reputation: 102

I think an easy way to do this is to use the copy function from Pandas. You can copy the columns "Movie", "FirstName", "LastName", "ID" to a new table. Then delete the columns you don't need in your first column. You can also create a new table for the other.

new = d['Movie', 'FirstName', 'LastName', 'ID].copy

Upvotes: 0

Related Questions