Reputation: 49
I am new to python and Stackoverflow. I have a problem with handling the data. I have 2 sets of data with different size. df1 has a size of 1000 and df2 has a size of 100000. Here are the samples of df1 and df2.
df1=
Date x y
0 2020-01-01 01:01 1.1 2.4
1 2020-01-01 01:05 4.2 5.5
2 2020-01-01 01:08 7.3 8.6
df2=
Date x y
0 2020-01-01 01:00 NaN NaN
1 2020-01-01 01:01 NaN NaN
2 2020-01-01 01:02 NaN NaN
3 2020-01-01 01:03 NaN NaN
4 2020-01-01 01:04 NaN NaN
5 2020-01-01 01:05 NaN NaN
6 2020-01-01 01:06 NaN NaN
7 2020-01-01 01:07 NaN NaN
8 2020-01-01 01:08 NaN NaN
9 2020-01-01 01:09 NaN NaN
10 2020-01-01 01:10 NaN NaN
What I would like to do is to combine them together as a new data frame, if df1['Date']=df2['Date']
, df3 will be shown as follow.
df3=
Date x y
0 2020-01-01 01:00 NaN NaN
1 2020-01-01 01:01 1.1 2.4
2 2020-01-01 01:02 NaN NaN
3 2020-01-01 01:03 NaN NaN
4 2020-01-01 01:04 NaN NaN
5 2020-01-01 01:05 4.2 5.5
6 2020-01-01 01:06 NaN NaN
7 2020-01-01 01:07 NaN NaN
8 2020-01-01 01:08 7.3 8.6
9 2020-01-01 01:09 NaN NaN
10 2020-01-01 01:10 NaN NaN
Then, the NaN values will be equal to the closest value above
df3=
Date x y
0 2020-01-01 01:00 NaN NaN
1 2020-01-01 01:01 1.1 2.4
2 2020-01-01 01:02 1.1 2.4
3 2020-01-01 01:03 1.1 2.4
4 2020-01-01 01:04 1.1 2.4
5 2020-01-01 01:05 4.2 5.5
6 2020-01-01 01:06 4.2 5.5
7 2020-01-01 01:07 4.2 5.5
8 2020-01-01 01:08 7.3 8.6
9 2020-01-01 01:09 7.3 8.6
10 2020-01-01 01:10 7.3 8.6
Thx a lot!
Upvotes: 0
Views: 79
Reputation: 2287
One way, would be to use update
on your complete df (assuming it includes all indices). Then use fillna
to get the previous values for all your missings:
a = pd.DataFrame(
{
"date": pd.date_range(start="2020-01-01", periods=3),
"x": [1, np.nan, 3],
"y": [5, np.nan, 6],
}
).set_index("date")
b = pd.DataFrame(
{
"date": pd.date_range(start="2020-01-01", periods=5),
"x": [np.nan] * 5,
"y": [np.nan] * 5,
}
).set_index("date")
print(a, b)
| date | x | y |
|:--------------------|----:|----:|
| 2020-01-01 00:00:00 | 1 | 5 |
| 2020-01-02 00:00:00 | nan | nan |
| 2020-01-03 00:00:00 | 3 | 6 |
| date | x | y |
|:--------------------|----:|----:|
| 2020-01-01 00:00:00 | nan | nan |
| 2020-01-02 00:00:00 | nan | nan |
| 2020-01-03 00:00:00 | nan | nan |
| 2020-01-04 00:00:00 | nan | nan |
| 2020-01-05 00:00:00 | nan | nan |
b.update(a)
b = b.fillna(method="ffill")
print(b)
| date | x | y |
|:--------------------|----:|----:|
| 2020-01-01 00:00:00 | 1 | 5 |
| 2020-01-02 00:00:00 | 1 | 5 |
| 2020-01-03 00:00:00 | 3 | 6 |
| 2020-01-04 00:00:00 | 3 | 6 |
| 2020-01-05 00:00:00 | 3 | 6 |
Upvotes: 1