Reputation: 3
First, I haven't found this asked before - probably because I'm not using the right words to ask it. So if it has been asked, please send me in that direction.
How can I combine two pandas data frames based on column AND row. My main dataframe has a column 'years' and a column 'county' among others. Ideally, I want to add another column 'percent' from the second data frame below.
For example, I have this image of my first df:
and I have another data frame with the same 'year' column and every other column name is a string value in the original "main" dataframe's 'county' column:
How can I combine these two data frames in a way that adds another column to the 'main df'? It would be helpful to first put the second data frame in the format where there are three columns: 'year', 'county', and 'percent'. If anyone can help me with this part, I can merge it.
Upvotes: 0
Views: 70
Reputation: 386
I think what you will want to do is transform the second dataframe to have a row for each year/county combination and then you can use a left join to combine the two. I believe the ```melt`` method will do this transformation. Try this:
melted_second_df = second_df.melt(id_vars=["year"], var_name="county", value_name="percent")
combined_df = first_df.merge(
right=melted_second_df,
on=["year", "county"],
how="left"
)
Upvotes: 1