kostya ivanov
kostya ivanov

Reputation: 707

How to combine two dataframes by index (pandas)

I have two dataframes, with the same date field, but different other fields. I need to add a column pneumonia_ARVI from dataframe pneumonia_ARVI to dataframe Result_data.

They initially differ in the number of dates, in Result_data dataframe there are significantly more dates than in pneumonia_ARVI

enter image description here

I need a concatenation with a date match, but if the records in the dataframe pneumonia_ARVI than in the dataframe Result_data, then the preference would have the dates specified in the dataset Result_data. And the data that is missing in the dataset pneumonia_ARVI replaced with empty values.

I have tried doing

Result_data = Result_data.set_index('date')
pneumonia_ARVI = pneumonia_ARVI.set_index('date')
End = pd.merge(Result_data, pneumonia_ARVI, left_index=True, right_index=True)

But this led to the fact that the data was adjusted to each other, and the field infected_city do not leave all their original values by date.

How to combine this data correctly so that there are no problems with reducing the total number of dates?

Upvotes: 0

Views: 48

Answers (1)

not_speshal
not_speshal

Reputation: 23146

Use join:

#convert to datetime if needed
Result_data["date"] = pd.to_datetime(Result_data["date"])
pneumonia_ARVI["date"] = pd.to_datetime(pneumonia_ARVI["date"])

#set index as you have done
Result_data = Result_data.set_index('date')
pneumonia_ARVI = pneumonia_ARVI.set_index('date')

#perform a left join
End = Result_data.join(pneumonia_ARVI, how="left")

Upvotes: 2

Related Questions