Reputation: 2698
The title doesn't make a clear sense, but it was the best I could describe.
I have DataFrames that look like these:
data1 = np.matrix([[4,75,2,5,84,2,6,5,554],[4,6,67,6,4,5,8,5,8]]).T
data2 = np.matrix([[3,46,4,555,556,557,558,559,560],[1,2,4,1,3,5,3,1,5]]).T
data1 = pd.DataFrame(data1)
data2 = pd.DataFrame(data2)
>>> data1
0 1
0 4 4
1 75 6
2 2 67
3 5 6
4 84 4
5 2 5
6 6 8
7 5 5
8 554 8
>>> data2
0 1
0 3 1
1 46 2
2 4 4
3 555 1
4 556 3
5 557 5
6 558 3
7 559 1
8 560 5
I want to append data2
to the bottom of data1
. However, I want to append rows of data2
whose column1 values are greater than or equal to the 554
, which is the last row of column1 of data1
Here is the output I want:
>>> merged_df
0 1
0 4 4
1 75 6
2 2 67
3 5 6
4 84 4
5 2 5
6 6 8
7 5 5
8 554 8
9 555 1
10 556 3
11 557 5
12 558 3
13 559 1
14 560 5
So, some of the first rows of data2
will be skipped when appending to data1
.
It is assumed that the last row of data1
is its maximum, and the rows of data2
are sorted after the last value of data1
, which is 554
Is there any elegant way to do this job using Pandas toolbox?
Upvotes: 3
Views: 719
Reputation: 862541
Use concat
with filter second DataFrame
by boolean indexing
:
print (data1.iloc[-1, 0])
554
df = pd.concat([data1, data2[data2[0] > data1.iloc[-1, 0]]], ignore_index=True)
print (df)
0 1
0 4 4
1 75 6
2 2 67
3 5 6
4 84 4
5 2 5
6 6 8
7 5 5
8 554 8
9 555 1
10 556 3
11 557 5
12 558 3
13 559 1
14 560 5
Also for general solution compare by max
value:
df = pd.concat([data1, data2[data2[0] > data1[0].max()]], ignore_index=True)
Solution if custom columns names:
data1 = pd.DataFrame(data1, columns=list('ab'))
data2 = pd.DataFrame(data2, columns=list('ab'))
print (data1.iloc[-1, data1.columns.get_loc('a')])
554
data22 = data2[data2['a'] > data1.iloc[-1, data1.columns.get_loc('a')]]
df = pd.concat([data1, data22], ignore_index=True)
print (df)
a b
0 4 4
1 75 6
2 2 67
3 5 6
4 84 4
5 2 5
6 6 8
7 5 5
8 554 8
9 555 1
10 556 3
11 557 5
12 558 3
13 559 1
14 560 5
Upvotes: 3