Reputation: 730
I really stuck in this problem and dont have any idea how to solve that. I have two data frame, one is for the humidity and its data are reported every 15 minutes. I have three different sensors for reporting. So, the table includes the id, the date, and hour of the reporting. Here is:
df_h = pd.DataFrame({'id_h': {0: 1, 1: 1, 2: 2, 3: 2, 4: 3, 5: 3}, 'date': {0: '2021-01-01', 1: '2021-01-01', 2: '2021-01-01', 3: '2021-01-01', 4: '2021-01-01', 5: '2021-01-01'}, 'time_hour': {0: '6:00:00', 1: '6:15:00', 2: '6:00:00', 3: '6:15:00', 4: '6:00:00', 5: '6:15:00'}, 'VALUE': {0: 10, 1: 12, 2: 20, 3: 22, 4: 30, 5: 32}})
id_h date time_hour VALUE
0 1 2021-01-01 6:00:00 10
1 1 2021-01-01 6:15:00 12
2 2 2021-01-01 6:00:00 20
3 2 2021-01-01 6:15:00 22
4 3 2021-01-01 6:00:00 30
5 3 2021-01-01 6:15:00 32
with the following code, I can stick its data together and for each id, in each day, I have the humidity.
humidity_sticked = df_h.pivot(index=["id_h", "date"], columns="time_hour", values="VALUE")
humidity_sticked.columns = [f"value_{i+1}" for i in range(humidity_sticked.shape[1])]
humidity_sticked =humidity_sticked.reset_index()
As we can see, we have a table with three rows and two columns.
Also, I have another table which shows the temperature. But, the id for the weather center is different. For example, for id_h (id of humidity) = 1, 2 we only have the id_t (id of temperature) = 5. So, we have exact same table for the temperature, but since the ids are different, I can not create the same stick table as humidity. Here is the table for the temperature:
df_t = pd.DataFrame({'id_t': {0: 5, 1: 5, 2: 5, 3: 5, 4: 7}, 'date': {0: '2021-01-01', 1: '2021-01-01', 2: '2021-01-01', 3: '2021-01-01', 4: '2021-01-01'}, 'time_hour': {0: '6:00:00', 1: '6:15:00', 2: '6:00:00', 3: '6:15:00', 4: '6:00:00'}, 'VALUE': {0: -1, 1: -8, 2: -2, 3: -9, 4: -3}})
id_t date time_hour VALUE
0 5 2021-01-01 6:00:00 -1
1 5 2021-01-01 6:15:00 -8
2 5 2021-01-01 6:00:00 -2
3 5 2021-01-01 6:15:00 -9
4 7 2021-01-01 6:00:00 -3
When I want to stick the values for id_t=5, I got an error. The desired output which I want is:
Explanation: for id_h=1,2 we have two 5. So, for the first two rows we consider as 1, the second two rows as id=2 and the last two rows are for id=3 which are for id_t=7.
Any help can save me!Thanks
update: I've used the merge by the index, however, when I have missing values in one of the data frame, (for example for a specific date, at time 6:00 I have the humidity, but I don't have the temperature). The results are wrong.
Here is the the results of the merge by the index, we can see that the time is not same, but it put all them in one row.
Upvotes: 0
Views: 68
Reputation: 136
You can use the pd.merge
by index
. This way is the shortcut to make your 'sticked dataframe'.
pd.merge(df_t, df_h, left_index=True, right_index=True, suffixes=['_t', '_h'])
Output:
id_t date_t time_hour_t VALUE_t id_h date_h time_hour_h \
0 5 2021-01-01 6:00:00 -1 1 2021-01-01 6:00:00
1 5 2021-01-01 6:15:00 -8 1 2021-01-01 6:15:00
2 5 2021-01-01 6:00:00 -2 2 2021-01-01 6:00:00
3 5 2021-01-01 6:15:00 -9 2 2021-01-01 6:15:00
4 7 2021-01-01 6:00:00 -3 3 2021-01-01 6:00:00
VALUE_h
0 10
1 12
2 20
3 22
4 30
The output above contains useless columns, so you can merge df_t
and df_h[only you need to merge]
like below:
pd.merge(df_t, df_h[['id_h','VALUE']], left_index=True, right_index=True, suffixes=['_t', '_h'])
Output:
id_t date time_hour VALUE_t id_h VALUE_h
0 5 2021-01-01 6:00:00 -1 1 10
1 5 2021-01-01 6:15:00 -8 1 12
2 5 2021-01-01 6:00:00 -2 2 20
3 5 2021-01-01 6:15:00 -9 2 22
4 7 2021-01-01 6:00:00 -3 3 30
This is the simplest way you want.
Upvotes: 1
Reputation: 13242
df_t['rank'] = df_t.id_t.rank(method='dense')
df_h['rank'] = df_h.id_h.rank(method='dense')
df = df_t.merge(df_h, on=['rank', 'date', 'time_hour'], suffixes=['_1', '_2'])
print(df)
Output:
id_t date time_hour VALUE_1 rank id_h VALUE_2
0 5 2021-01-01 6:00:00 -1 1.0 1 10
1 5 2021-01-01 6:00:00 -2 1.0 1 10
2 5 2021-01-01 6:15:00 -8 1.0 1 12
3 5 2021-01-01 6:15:00 -9 1.0 1 12
4 7 2021-01-01 6:00:00 -3 2.0 2 20
Upvotes: 2