Reputation: 359
I want to append rows from second DataFrame (df2) to first DataFrame (df1) depending whether in df1 column "isValid" is [T]rue.
I know how to iterate over df1 column and search for True values, but don't know how to easily append rows from second DataFrame. Originally my data have around 1000 lines and 40 columns, so I need to do operations automatically.
import pandas
df1 = pandas.read_csv('df1.csv', sep=';')
df2 = pandas.read_csv('df2.csv', sep=';')
print(df1.to_string(), '\n')
print(df2.to_string(), '\n')
columnSeriesObj = df1.iloc[:, 2]
n = 0
k = 0
for i in columnSeriesObj:
if i == "T":
print("True in row number", k)
# APPEND n ROW from df2 to k ROW from df1
n += 1
k += 1
print('\n', df1.to_string())
Here are some test values:
df1.csv
DataA;DataB;isValid
1568;1104;F
1224;1213;F
1676;1246;F
1279;1489;T
1437;1890;T
1705;1007;F
1075;1720;F
1361;1983;F
1966;1751;F
1938;1564;F
1894;1684;F
1189;1803;F
1275;1138;F
1085;1748;T
1337;1775;T
1719;1975;F
1045;1187;F
1426;1757;F
1410;1363;F
1405;1025;F
1699;1873;F
1777;1464;F
1925;1310;T
df2.csv
Nr;X;Y;Z;A ;B;C
1;195;319;18;qwe;hjk;wsx
2;268;284;23;rty;zxc;edc
3;285;277;36;uio;vbn;rfv
4;143;369;34;asd;mlp;tgb
5;290;247;16;fgh;qaz;yhn
I want to df1 after appending look like this (screenshot from Excel):
Thank you for any suggestions! :D
Upvotes: 2
Views: 81
Reputation: 71689
You can filter the index values in df1
where the column isValid
equals T
, then update the index of df2
with the filtered index values from df1
finally join
it with df1
:
m = df1['isValid'].eq('T')
idx = m[m].index[:len(df2)]
df1.join(df2.set_index(idx)).fillna('')
DataA DataB isValid Nr X Y Z A B C
0 1568 1104 F
1 1224 1213 F
2 1676 1246 F
3 1279 1489 T 1 195 319 18 qwe hjk wsx
4 1437 1890 T 2 268 284 23 rty zxc edc
5 1705 1007 F
6 1075 1720 F
7 1361 1983 F
8 1966 1751 F
9 1938 1564 F
10 1894 1684 F
11 1189 1803 F
12 1275 1138 F
13 1085 1748 T 3 285 277 36 uio vbn rfv
14 1337 1775 T 4 143 369 34 asd mlp tgb
15 1719 1975 F
16 1045 1187 F
17 1426 1757 F
18 1410 1363 F
19 1405 1025 F
20 1699 1873 F
21 1777 1464 F
22 1925 1310 T 5 290 247 16 fgh qaz yhn
Upvotes: 1
Reputation: 202
I suggest the following:
I created some dummy data, similar to yours:
import pandas as pd
import random
df = pd.DataFrame({"a": list(range(20)), "b": [random.choice(("T", "F")) for _ in range(20)]})
df2 = pd.DataFrame({"value1": list(range(5)), "nr": list(range(5))})
First you create a new column in the first dataframe that holds the incrementing ID ("Nr"). To do so, use the count
generator from itertools.
from itertools import count
counter = count(start=1)
df["id"] = df.apply(lambda row: next(counter) if row["b"] == "T" else None, axis=1)
After that you can perform a join with the merge
method.
df.merge(df2, left_on="id", right_on="nr", how="outer")
Upvotes: 1
Reputation: 2970
How about something like this:
(e.g. first find the overlapping index-values and then join the dataframes)
import pandas as pd
import numpy as np
df1 = pd.read_csv("df1.csv", sep=';')
df2 = pd.read_csv(r"df2.csv", sep=';')
# find intersecting indices
useidx = np.intersect1d(df2.index,
df1[df1.isValid == 'T'].index)
# join relevant values
df_joined = df1.join(df2.loc[useidx])
df_joined
then looks like this:
>>> DataA DataB isValid Nr X Y Z A B C
>>> 0 1568 1104 F NaN NaN NaN NaN NaN NaN NaN
>>> 1 1224 1213 F NaN NaN NaN NaN NaN NaN NaN
>>> 2 1676 1246 F NaN NaN NaN NaN NaN NaN NaN
>>> 3 1279 1489 T 4.0 143.0 369.0 34.0 asd mlp tgb
>>> 4 1437 1890 T 5.0 290.0 247.0 16.0 fgh qaz yhn
>>> 5 1705 1007 F NaN NaN NaN NaN NaN NaN NaN
>>> 6 1075 1720 F NaN NaN NaN NaN NaN NaN NaN
>>> 7 1361 1983 F NaN NaN NaN NaN NaN NaN NaN
Upvotes: 0