Reputation: 3015
I have a list of dataframes in python pandas that have the same rowname and rowvalues. What I would like to do is produce one dataframe with them innerjoined on the rowvalues. I have looked online and found the merge function, but this isn't working because my rows aren't a column. Does anyone know the best way to do this? Is the solution to take the row values and turn it into a column, and if so how do you do that? Thanks for the help.
input:
"happy"
userid
1 2
2 8
3 9
"sad"
userid
1 9
2 12
3 11
output:
"sad" "happy"
userid
1 9 2
2 12 8
3 11 9
Upvotes: 0
Views: 2743
Reputation: 37930
It looks like your DataFrames have indices, in which case your merge()
should indicate that's how it wants to proceed:
In [51]: df1
Out[51]:
"happy"
userid
1 2
2 8
3 9
In [52]: df2
Out[52]:
"sad"
userid
1 9
2 12
3 11
In [53]: pd.merge(df2, df1, left_index=True, right_index=True)
Out[53]:
"sad" "happy"
userid
1 9 2
2 12 8
3 11 9
And if you want to run this over a list of DataFrames, just reduce()
them:
reduce(lambda x, y: pd.merge(x, y, left_index=True, right_index=True), list_of_dfs)
Upvotes: 2
Reputation: 879341
Transposing swaps the columns and rows of the DataFrame. If dfs
is your list of DataFrames, then:
dfs = [df.T for df in dfs]
will make dfs
a list of transposed DataFrames.
Then to merge:
merged = dfs[0]
for df in dfs[1:]:
merged = pd.merge(merged, df, how='inner')
By default pd.merge
merges DataFrames based on all columns shared in common.
Note that transposing requires copying all the data in the original DataFrame into a new DataFrame. It would be more efficient to build the DataFrame in the correct (transposed) format from the beginning (if possible), rather than fixing it later by transposing.
Upvotes: 2