Reputation: 435
I have 1000 observations that I have analysed using 4 different models. The results of these models are stored in 4 data frames, that look something like this:
import pandas as pd
model_1 = pd.DataFrame(data={"observation_id": [0, 1, ..., 1000],
"param_1": [0, 4, ..., 2]})
model_2 = pd.DataFrame(data={"observation_id": [0, 2, ..., 1000],
"param_1": [1, 3, ..., 1]})
similarly for the other two (there are actually many more parameters in the data frames).
The "observation_id" uniquely matches the results of a model to the observation. So an observation_id of 0 gives the results of the four models for that particular observation.
The problem is that there are gaps in the modelled data. model_1 might be missing observation_id 45 and model_3 might be missing observation_id 721.
How can I create four new data frames containing just the results related to observation_ids that are common to all 4 datasets?
Upvotes: 0
Views: 42
Reputation: 323276
Here is one way using merge
with how='left'
from functools import reduce
p1 = reduce(lambda left,right: pd.merge(left,right,on='observation_id',how='left'), [df1,df2,df3,df4])
If you only want one df not four
from functools import reduce
df = reduce(lambda left,right: pd.merge(left,right,on='observation_id',how='inner'), [df1,df2,df3,df4])
Upvotes: 1