mt88
mt88

Reputation: 3015

Inner Join list of DataFrames on Row Values

I have a list of dataframes in python pandas that have the same rowname and rowvalues. What I would like to do is produce one dataframe with them innerjoined on the rowvalues. I have looked online and found the merge function, but this isn't working because my rows aren't a column. Does anyone know the best way to do this? Is the solution to take the row values and turn it into a column, and if so how do you do that? Thanks for the help.

input:

          "happy"   
userid     
   1         2       
   2         8        
   3         9

          "sad"
userid
   1        9
   2        12
   3        11

output:

           "sad"   "happy"
userid
   1         9        2
   2         12       8
   3         11       9

Upvotes: 0

Views: 2743

Answers (2)

chrisaycock
chrisaycock

Reputation: 37930

It looks like your DataFrames have indices, in which case your merge() should indicate that's how it wants to proceed:

In [51]: df1
Out[51]:
        "happy"
userid
1             2
2             8
3             9

In [52]: df2
Out[52]:
        "sad"
userid
1           9
2          12
3          11

In [53]: pd.merge(df2, df1, left_index=True, right_index=True)
Out[53]:
        "sad"  "happy"
userid
1           9        2
2          12        8
3          11        9

And if you want to run this over a list of DataFrames, just reduce() them:

reduce(lambda x, y: pd.merge(x, y, left_index=True, right_index=True), list_of_dfs)

Upvotes: 2

unutbu
unutbu

Reputation: 879341

Transposing swaps the columns and rows of the DataFrame. If dfs is your list of DataFrames, then:

dfs = [df.T for df in dfs]

will make dfs a list of transposed DataFrames.

Then to merge:

merged = dfs[0]
for df in dfs[1:]:
    merged = pd.merge(merged, df, how='inner')

By default pd.merge merges DataFrames based on all columns shared in common.


Note that transposing requires copying all the data in the original DataFrame into a new DataFrame. It would be more efficient to build the DataFrame in the correct (transposed) format from the beginning (if possible), rather than fixing it later by transposing.

Upvotes: 2

Related Questions