user60108
user60108

Reputation: 3452

pandas - transform 1 dataframe into two dataframe with foreign key

I have a Dataframe df as follows

df = pd.DataFrame(
    [['tom', 28, "tesla", "model s" ], 
    ['nick', 30, "ford", "mustang"], 
    ['juli', 56, "tesla", "model s"]] , 
    columns = ['PersonName', 'PersonAge', "Car", "CarModel"]
)
df.head()

    PersonName  PersonAge   Car CarModel
0   tom     28  tesla   model s
1   nick    30  ford    mustang
2   juli    56  tesla   model s

I would like to transform it into two dataframe as follows

Person=
PersonName PersonAge CarId
tom        28        1 
nick       30        2
juli       56        1

Cars=
id Car    Model
1  Tesla  Model S
2  Ford   Mustang

What would be the correct way to do this?

Upvotes: 0

Views: 430

Answers (3)

frankr6591
frankr6591

Reputation: 1247

The following gives you what you want.

Use groupby to create Cars dataframe

CarsDF = pd.DataFrame(df.groupby(['Car','CarModel']).Car.count())
CarsDF.columns=['v']
CarsDF = CarsDF.reset_index().drop('v', axis=1)
CarsDF.index.name='CarId'
CarsDF

Then use merge to add CarId

PersonDF = df.merge(CarsDF.reset_index())[['PersonName','PersonAge','CarId']]

Upvotes: 1

tdelaney
tdelaney

Reputation: 77347

You can assign columns to new dataframes to create Person and Cars.

>>> df = pd.DataFrame(
...     [['tom', 28, "tesla", "model s" ], 
...     ['nick', 30, "ford", "mustang"], 
...     ['juli', 56, "tesla", "model s"]] , 
...     columns = ['PersonName', 'PersonAge', "Car", "CarModel"]
... )
>>> df
  PersonName  PersonAge    Car CarModel
0        tom         28  tesla  model s
1       nick         30   ford  mustang
2       juli         56  tesla  model s
>>> 
>>> Person = df[["PersonName", "PersonAge"]]
>>> Cars = df[["Car", "CarModel"]]
>>> Person
  PersonName  PersonAge
0        tom         28
1       nick         30
2       juli         56
>>> Cars
     Car CarModel
0  tesla  model s
1   ford  mustang
2  tesla  model s

Notice that your original dataframe had an index and that index is preserved when creating the new ones. That index can be used as your foreign key. In fact, the indexing part is frequently transparent in pandas. To select a car from the person name, you could

>>> Cars[Person["PersonName"] == "nick"]
    Car CarModel
1  ford  mustang

If you want to turn the index into its own column you can do so

>>> Cars["id"] = Cars.index
>>> Person["CarId"] = Person.index
>>> Person
  PersonName  PersonAge  CarId
0        tom         28      0
1       nick         30      1
2       juli         56      2
>>> Cars
     Car CarModel  id
0  tesla  model s   0
1   ford  mustang   1
2  tesla  model s   2

But sticking with operations on the index is generally more convenient.... unless you reset the index, then, you have to go with a different column

Upvotes: 2

Serial Lazer
Serial Lazer

Reputation: 1669

This should work:

import pandas as pd

df = pd.DataFrame(
    [['tom', 28, "tesla", "model s" ],
    ['nick', 30, "ford", "mustang"],
    ['juli', 56, "tesla", "model s"]] ,
    columns = ['PersonName', 'PersonAge', "Car", "CarModel"]
)
df['Comb'] = df['Car'] + "," + df['CarModel']
car_df = pd.DataFrame({'CarId': range(1, len(df['Comb'].unique())+1), 'Comb': df['Comb'].unique()})
car_df = car_df.merge(df[['Comb', 'Car', 'CarModel']], on=['Comb']).drop_duplicates(subset=['Comb']).drop(columns=['Comb'])

df = df.merge(car_df, on=['Car', 'CarModel']).drop(columns=['Comb', 'Car', 'CarModel'])
car_df = car_df.set_index('CarId')
print(df)
print()
print(car_df)

Output:

  PersonName  PersonAge  CarId
0        tom         28      1
1       juli         56      1
2       nick         30      2

         Car CarModel
CarId                
1      tesla  model s
2       ford  mustang

Upvotes: 1

Related Questions