Reputation: 1488
I have two data frames as below.
df_names = pl.DataFrame({'last_name':['Williams','Henry','XYX','Smith','David','Freeman','Walter','Test_A'],
'first_name':['Henry','Williams','ABC','David','Smith','Walter','Freeman','Test_B']})
Another frame as,
df_updates = pl.DataFrame({'last_name':['Williams','Henry','Smith','David','Freeman','Walter'],
'first_name':['Henry','Williams','David','Smith','Walter','Freeman'],
'last_name_x':['Williams','Williams','Smith','Smith','Freeman','Freeman'],
'first_name_x':['Henry','Henry','David','David','Walter','Walter']})
Here I would like to update the last_name and first_name from df_names Dataframe whichever last and first names matched in dataframe- df_updates with last_name_x and first_name_x values.
here is an expected output as:
Upvotes: 0
Views: 125
Reputation: 476
# Join dataframe:
# - on columns of interest.
# - left join (keep all rows from first dataframe).
In [120]: (
...: df_names
...: .join(
...: df_updates,
...: on=["last_name", "first_name"],
...: how="left"
...: )
...: )
Out[120]:
shape: (8, 4)
┌───────────┬────────────┬─────────────┬──────────────┐
│ last_name ┆ first_name ┆ last_name_x ┆ first_name_x │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ str │
╞═══════════╪════════════╪═════════════╪══════════════╡
│ Williams ┆ Henry ┆ Williams ┆ Henry │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Henry ┆ Williams ┆ Williams ┆ Henry │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ XYX ┆ ABC ┆ null ┆ null │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Smith ┆ David ┆ Smith ┆ David │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ David ┆ Smith ┆ Smith ┆ David │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Freeman ┆ Walter ┆ Freeman ┆ Walter │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Walter ┆ Freeman ┆ Freeman ┆ Walter │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Test_A ┆ Test_B ┆ null ┆ null │
└───────────┴────────────┴─────────────┴──────────────┘
# Join dataframe:
# - on columns of interest.
# - left join (keep all rows from first dataframe).
# Depending if the last and first name were found the the df_updates table:
# - take the last and first name form *_ column of df_updates
# - or take original last and first name from df_names
# - and only retrain those values (rename columns)
In [121]: (
...: df_names
...: .join(
...: df_updates,
...: on=["last_name", "first_name"],
...: how="left"
...: )
...: .select(
...: [
...: pl.when(
...: pl.col("last_name_x").is_null()
...: ).then(
...: pl.col("last_name")
...: ).otherwise(
...: pl.col("last_name_x")
...: ).alias("last_name"),
...: pl.when(
...: pl.col("first_name_x").is_null()
...: ).then(
...: pl.col("first_name")
...: ).otherwise(
...: pl.col("first_name_x")
...: ).alias("first_name"),
...: ]
...: )
...: )
Out[121]:
shape: (8, 2)
┌───────────┬────────────┐
│ last_name ┆ first_name │
│ --- ┆ --- │
│ str ┆ str │
╞═══════════╪════════════╡
│ Williams ┆ Henry │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Williams ┆ Henry │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ XYX ┆ ABC │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Smith ┆ David │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Smith ┆ David │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Freeman ┆ Walter │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Freeman ┆ Walter │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Test_A ┆ Test_B │
└───────────┴────────────┘
Upvotes: 2