Reputation: 3097
I have a dataframe as follows
d = {'Movie' : ['The Shawshank Redemption', 'The Godfather'],
'FirstName1': ['Tim', 'Marlon'],
'FirstName2': ['Morgan', 'Al'],
'LastName1': ['Robbins', 'Brando'],
'LastName2': ['Freeman', 'Pacino'],
'ID1': ['TM', 'MB'],
'ID2': ['MF', 'AP']
}
df = pd.DataFrame(d)
df
I would like to re-arrange it into a 4 column dataframe,
by converting Firstname1, LastName1, FirstName2, LastName2, ID1, ID2
into 3 column rows as FirstName, LastName, ID
then column movie
repeats as follows.
In sql
we do it as follows
select Movie as Movie, FirstName1 as FirstName, LastName1 as LastName, ID1 as ID from table
union
select Movie as Movie, FirstName2 as FirstName, LastName2 as LastName, ID2 as ID from table
Can we achieve it using pandas ?
Upvotes: 3
Views: 283
Reputation: 2022
Try below:
d1 = df.filter(regex="1$|Movie").rename(columns=lambda x: x[:-1])
d2 = df.filter(regex="2$|Movie").rename(columns=lambda x: x[:-1])
pd.concat([d1, d2]).rename({'Movi':'Movie'})
Upvotes: 0
Reputation: 862511
If possible number in column names more like 9
use Series.str.extract
for get integers with values before to MultiIndex
to columns, so possible DataFrame.stack
:
df = df.set_index('Movie')
df1 = df.columns.to_series().str.extract('([a-zA-Z]+)(\d+)')
df.columns = pd.MultiIndex.from_arrays([df1[0], df1[1].astype(int)])
df = df.rename_axis((None, None), axis=1).stack().reset_index(level=1, drop=True).reset_index()
print (df)
Movie FirstName ID LastName
0 The Shawshank Redemption Tim TM Robbins
1 The Shawshank Redemption Morgan MF Freeman
2 The Godfather Marlon MB Brando
3 The Godfather Al AP Pacino
If not use indexing for get last values of columns names with all previous and pass to MultiIndex.from_arrays
:
df = df.set_index('Movie')
df.columns = pd.MultiIndex.from_arrays([df.columns.str[:-1], df.columns.str[-1].astype(int)])
df = df.stack().reset_index(level=1, drop=True).reset_index()
print (df)
Movie FirstName ID LastName
0 The Shawshank Redemption Tim TM Robbins
1 The Shawshank Redemption Morgan MF Freeman
2 The Godfather Marlon MB Brando
3 The Godfather Al AP Pacino
Upvotes: 3
Reputation: 3353
df = df.set_index('Movie')
df.columns = pd.MultiIndex.from_tuples([(col[:-1], col[-1:]) for col in df.columns])
df.stack()
# FirstName ID LastName
#Movie
#The Shawshank Redemption 1 Tim TM Robbins
# 2 Morgan MF Freeman
#The Godfather 1 Marlon MB Brando
# 2 Al AP Pacino
Use the power of MultiIndex! With from_tuples
you create a DataFrame
that has one column for FirstNames, divided in FirstName1 and FirstName2 (see below) and similar for ID and LastName. With stack
you convert it into rows for each. Before you do this, make Movie
the Index to exclude it from what you are doing. You could use reset_index()
to regain everything as columns, but I'm not sure if you want that.
Before stack
:
# FirstName LastName ID
# 1 2 1 2 1 2
#Movie
#The Shawshank Redemption Tim Morgan Robbins Freeman TM MF
#The Godfather Marlon Al Brando Pacino MB AP
Upvotes: 2
Reputation: 102
I think an easy way to do this is to use the copy function from Pandas. You can copy the columns "Movie", "FirstName", "LastName", "ID" to a new table. Then delete the columns you don't need in your first column. You can also create a new table for the other.
new = d['Movie', 'FirstName', 'LastName', 'ID].copy
Upvotes: 0