Reputation: 814
I have a dataframe in Python, say A
, which has multiple columns, including columns named ECode
and FG
. I have another Pandas dataframe B
, also with multiple columns, including columns named ECode
,F Gping
(note the space in column name for F Gping
) and EDesc
. What I would like to do is to create a new column called EDesc
in dataframe A based on following conditions (Note that EDesc
, FG
and F Gping
contain String
type values (text), while the remaining columns are numeric/floating type. Also, dataframes A
and B
are of different dimensions (with differing rows and columns, and I want to check equality in specific values in the dataframe columns):-
A
, where value in ECode
matches value ECode
in dataframe B
, then, in the new column EDesc
to be created in dataframe A
, add the same values as EDesc
in B
.A
where value in FG
matches F Gping
values, in the new column EDesc
in A
, add same values as EDesc
in B
.EDesc
column in A
still has missing values/NaNs, then add the string value MissingValue
to all the rows in the Dataframe A
's EDesc
column.I have tried using for
loops, as well as list comprehensions, but they don't help in accomplishing this. Moreover, the space within column name F Gping
in B
is created problems to access the same, as though I can access it like B['F Gping']
, it isn't solving the very purpose. Any help in this regard is appreciated.
Upvotes: 0
Views: 268
Reputation: 1405
I'm assuming values are unique in B['ECode'] and B['F Gping'], otherwise we'll have to choose which value we give to A['EDesc'] when we find two matching values for ECode or FG.
There might be a smarter way but here's what I would do with joins:
Example DataFrames:
A = pd.DataFrame({'ECode': [1, 1, 3, 4, 6],
'FG': ['a', 'b', 'c', 'b', 'y']})
B = pd.DataFrame({'ECode': [1, 2, 3, 5],
'F Gping': ['b', 'c', 'x', 'x'],
'EDesc': ['a', 'b', 'c', 'd']})
So they look like:
A
ECode FG
0 1 a
1 1 b
2 3 c
3 4 b
4 6 y
B
ECode F Gping EDesc
0 1 b a
1 2 c b
2 3 x c
3 5 x d
First let's create A['EDesc'] by saying that it's the result of joining A and B on ECode. We'll temporarily use EDesc as index:
A.set_index('ECode', inplace=True, drop=False)
B.set_index('ECode', inplace=True, drop=False)
A['EDesc'] = A.join(B, lsuffix='A')['EDesc']
This works because the result of A.join(B, lsuffix='A') is:
ECodeA FG ECode F Gping EDesc
ECode
1 1 a 1.0 b a
1 1 b 1.0 b a
3 3 c 3.0 x c
4 4 b NaN NaN NaN
6 6 y NaN NaN NaN
Now let's fillna on A['EDesc'], using the match on FG. Same thing:
A.set_index('FG', inplace=True, drop=False)
B.set_index('F Gping', inplace=True, drop=False)
A['EDesc'].fillna(A.join(B, lsuffix='A')['EDesc'].drop_duplicates(), inplace=True)
This works because the result of A.join(B, lsuffix='A') is:
ECodeA FG EDescA ECode F Gping EDesc
a 1 a a NaN NaN NaN
b 1 b a 1.0 b a
b 4 b NaN 1.0 b a
c 3 c c 2.0 c b
y 6 y NaN NaN NaN NaN
Also we dropped the duplicates because as you can see there are two b's in our index.
Finally let's fillna with "Missing" and reset the index:
A['EDesc'].fillna('Missing', inplace=True)
A.reset_index(drop=True, inplace=True)
Upvotes: 1