jos97
jos97

Reputation: 395

How to create a dataframe depending on 2 others existing dataframes?

I have a dataframe df1 like this:

topic_id    topic   
1           Sun
2           Cloud
3           Rain

I have an other dataframe df2 that looks like this:

article_id  title    topics
    1       title_1  []
    2       title_2  [Sun, Cloud]
    3       title_3  [Rain]

My goal is to use df1 and df2 to create a dataframe df3.

I want to create a row in df3 everytime a topic in lists of topics column appears. And in this row I would like to have the article id of df2 when a topic appears and also the topic id of df1. If the list is empty nothing is created in df3.

I would like to have a new dataframe looking like this:

article_id  topic_id    topics
        2       1       Sun
        2       2       Cloud
        3       3       Rain

How can I do this ?

Upvotes: 1

Views: 27

Answers (1)

jezrael
jezrael

Reputation: 862511

Use DataFrame.explode with default inner join in DataFrame.merge, last change order of columns by list:

cols = ['article_id','topic_id','topics']
df = df2.explode('topics').merge(df1.rename(columns={'topic':'topics'}))[cols]
print (df)
   article_id  topic_id topics
0           2         1    Sun
1           2         2  Cloud
2           3         3   Rain

Upvotes: 2

Related Questions