Combining dataframes of different lengths

Question

I have two different dataframes, where one is an extended version of another. (1) How do I combine the two efficiently based on if the two dataframes share the same Name? (2) also is there a way to add in four spaces for code in the stackoverflow box without typing in four spaces for each line? That can be time consuming.

more details

One is a full dataframe with multiple listings of a value (sortedregsubjdf). the other contains only unique values of that other dataframe (as it is a dataframe of network centralities) - called sortedcentralitydf

sortedregsubjdf

Name    Organization    Year    Centrality
 6363    (Buz) Business And Commerce     doclist[524]    2012    0.503677
 8383    (Buz) Business And Commerce     doclist[697]    2012    0.503677
 1170    (Buz) Business And Commerce     doclist[103]    2012    0.503677
 1579    (Eco) Economics News    doclist[140]    2013    0.500624
 10979   (Gop) Provincial Government News    doclist[941]    2013    0.501232
 4374    (Gop) Provincial Government News    doclist[368]    2013    0.501232
 10988   (Npt) Not-For-Profits   doclist[942]    2013    0.498810

sortedcentralitiesdf (business and commerce only appears once since it contains unique values, where sortedregsubjdf has multiple values)

 Name   Centrality
 316     (Buz) Business And Commerce     0.503677
 448     (Eco) Economics News    0.500624
 499     (Gop) Provincial Government News    0.501232
 366     (Npt) Not-For-Profits   0.498810
 217     (Pdt) New Products And Services     0.504600

This was my code to combine the two dataframes, but I was wondering if there was a more efficient way to do so?

for i, val in enumerate(sortedcentralitydf.Name):
    for x, xval in enumerate(sortedregsubjdf.Name):
        if val == xval:
        #print val, xval
            sortedregsubjdf.Centrality[sortedregsubjdf.Name == xval] =   sortedcentralitydf.Centrality[sortedcentralitydf.Name == val].iloc[0]

Greg · Accepted Answer

Pandas has a merge function. It sounds like something like this would work...

import pandas as pd
merged_df = pd.merge(sortedregsubjdf, sortedcentralitiesdf, on='Name')

Combining dataframes of different lengths

Answers (1)

Related Questions