user2100039
user2100039

Reputation: 1356

Combining 2 Unequal Length Date Pandas

i have 2 unequal length pandas that I am trying to combine based on their matching hour, month, year, and plant_name. I'm not sure whether merge/concat/join is most efficient or uses less memory since my actual dataset is very large and I'm getting unexpected results with attempts like shown at the bottom here. As an example, my data looks like this for dataframe 1 (df1):

    plant_name business_name maint_region_name wind_speed_ms dataset  year  month  day  hour
0    CAETITE I        BRAZIL            BRAZIL          7.88    ERA5  2021      5   31    21
1    CAETITE I        BRAZIL            BRAZIL          7.95    ERA5  2021      5   31    20
2  RIO DO FOGO        BRAZIL            BRAZIL          7.72    ERA5  2021      1   31    19
3  RIO DO FOGO        BRAZIL            BRAZIL          7.68    ERA5  2021      1   31    18
4    CAETITE I        BRAZIL            BRAZIL          7.41    ERA5  2021      1   31    17
5    CAETITE I        BRAZIL            BRAZIL          7.32    ERA5  2021      1   31    16

And, dataframe 2 or df2:

  plant_name       vel  year  month  day  hour  
0  CAETITE I       NaN  2021      1   31    19
1  CAETITE I       NaN  2021      1   31    18
2  CAETITE I      5.42  2021      1   31    17
3  CAETITE I       NaN  2021      1   31    16

And, the final df or df_combined should look like this:

    plant_name business_name maint_region_name wind_speed_ms dataset year month day hour  vel
0    CAETITE I        BRAZIL            BRAZIL          7.41    ERA5 2021     1  31   17 5.42
1    CAETITE I        BRAZIL            BRAZIL          7.32    ERA5 2021     1  31   16  NaN

Ive tried many things like this below but each attempt is causing "df3" to have considerably large number of rows and more columns in my actual data than I expect so I must be doing something wrong with the approach below:

df3 = df2.merge(df1[["plant_name","year","month","day","hour","wind_speed_ms"]], on=["plant_name"], how="left")

thank you for your time.

Upvotes: 0

Views: 21

Answers (1)

Anton
Anton

Reputation: 606

What you are looking for is an inner-join here. Also you need to add year, month, day and hour to the join-key. The following should give you the right answer:

df1.merge(df2, on=["plant_name", "year", "month", "day", "hour"], how="inner")

Upvotes: 1

Related Questions