Math Lover
Math Lover

Reputation: 177

Merge two data-sets in Python Pandas

I have two datasets in the below format & want to merge them into a single dataset based on City+Age+Gender. Thanks in advance

Dataset1:

        City    Age  Gender            Source         Count
0  California  15-24  Female  Amazon Prime Video       14629
1  California  15-24  Female             Fubo TV        3840
2  California  15-24  Female                Hulu       54067
3  California  15-24  Female             Netflix       11713
4  California  15-24  Female            Sling TV       10642

Dataset2:

         City    Age  Gender           Source     Feeds
0  California  15-24  Female             Blogs    150
1  California  15-24  Female        Customsite     57
2  California  15-24  Female       Discussions     28
3  California  15-24  Female  Facebook Comment    555
4  California  15-24  Female           Google+     19

Expected resulting dataset:

    City      Age   Gender            Source          Count
  California  15-24  Female  Amazon Prime Video       14629
  California  15-24  Female             Fubo TV        3840
  California  15-24  Female                Hulu       54067
  California  15-24  Female             Netflix       11713
  California  15-24  Female            Sling TV       10642
  California  15-24  Female             Blogs          150
  California  15-24  Female        Customsite           57
  California  15-24  Female       Discussions           28
  California  15-24  Female  Facebook Comment          555
  California  15-24  Female           Google+           19

Note : Feeds/Count signify the same meaning. So okay to have either of them as the column name in the merged dataset.

Upvotes: 1

Views: 3029

Answers (1)

jezrael
jezrael

Reputation: 863301

Use pandas.concat with rename columns for align columns - need same columns in both DataFrames:

df = pd.concat([df1, df2.rename(columns={'Feeds':'Count'})], ignore_index=True)
print (df)
         City    Age  Gender              Source  Count
0  California  15-24  Female  Amazon Prime Video  14629
1  California  15-24  Female             Fubo TV   3840
2  California  15-24  Female                Hulu  54067
3  California  15-24  Female             Netflix  11713
4  California  15-24  Female            Sling TV  10642
5  California  15-24  Female               Blogs    150
6  California  15-24  Female          Customsite     57
7  California  15-24  Female         Discussions     28
8  California  15-24  Female    Facebook Comment    555
9  California  15-24  Female             Google+     19

Alternative with DataFrame.append - not pure python append:

df = df1.append(df2.rename(columns={'Feeds':'Count'}), ignore_index=True)
print (df)
         City    Age  Gender              Source  Count
0  California  15-24  Female  Amazon Prime Video  14629
1  California  15-24  Female             Fubo TV   3840
2  California  15-24  Female                Hulu  54067
3  California  15-24  Female             Netflix  11713
4  California  15-24  Female            Sling TV  10642
5  California  15-24  Female               Blogs    150
6  California  15-24  Female          Customsite     57
7  California  15-24  Female         Discussions     28
8  California  15-24  Female    Facebook Comment    555
9  California  15-24  Female             Google+     19

Upvotes: 2

Related Questions