Reputation: 2690
I have 2 DF where I want to check if df1["A"] is in df2. If not fill df2["A"] with 0.
I got it with and ugly for loop and I try to optimize this but I cannot find out how to do it.
testing_list = list(testing_df.columns)
for i in range(len(training_df.columns)):
if not training_df.columns[i] in testing_list:
testing_df[training_df.columns[i]] = 0
Upvotes: 0
Views: 36
Reputation: 862921
Use DataFrame.reindex
with new columns created by Index.union
:
testing_df = pd.DataFrame({
'A':list('abcdef'),
'B':[4,5,4,5,5,4],
'F':list('aaabbb')
})
training_df = pd.DataFrame({
'A':list('abcdef'),
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
})
cols = testing_df.columns.union(training_df.columns, sort=False)
df = testing_df.reindex(cols, axis=1, fill_value=0)
print (df)
A B F C D
0 a 4 a 0 0
1 b 5 a 0 0
2 c 4 a 0 0
3 d 5 b 0 0
4 e 5 b 0 0
5 f 4 b 0 0
If want add columns for both DataFrames with sorted columns use DataFrame.align
:
testing_df, training_df = testing_df.align(training_df, fill_value=0)
print (testing_df)
A B C D F
0 a 4 0 0 a
1 b 5 0 0 a
2 c 4 0 0 a
3 d 5 0 0 b
4 e 5 0 0 b
5 f 4 0 0 b
print (training_df)
A B C D F
0 a 0 7 1 0
1 b 0 8 3 0
2 c 0 9 5 0
3 d 0 4 7 0
4 e 0 2 1 0
5 f 0 3 0 0
Upvotes: 1