Reputation: 3577
My first DataFrame looks like this:
ESD Species Key1
Sub POAP SubPOAP
Sub FRAP SubFRAP
Sand POAP SandPOAP
Sand ACER SandACER
Sand SALI SandSALI
So the Key1
is the concatenation of ESD and Species.
My second DataFrame looks like this:
ESD Species Values Key
Sub POAP 10 SubPOAP
Sub FRAP 20 SubFRAP
So I want to join the second frame to the first based on Key1
and Key
but the first DataFrame is larger and I still want to retain the columns that don't match (with ESD as Sand) in the new DataFrame. Eventually I will merge anything with an ESD of Sand too.
So far I have used this code and set 'Key1' and 'Key' as indexes:
merge=pd.merge(df, df2, left_index='True', right_index='True')
but this does not retain columns that don't have a matching key. Would perhaps a join instead of a merge be what I need?
This code results in:
ESD Species Key1 Values
Sub POAP SubPOAP 10
Sub FRAP SubFRAP 20
The desired output is:
ESD Species Key1 Values
Sub POAP SubPOAP 10
Sub FRAP SubFRAP 20
Sand POAP SandPOAP NaN
Sand ACER SandACER NaN
Sand SALI SandSALI NaN
I don't really care if the duplicate column heading carry over to the join, I will just remove them (there are two columns named Species).
Upvotes: 0
Views: 2964
Reputation: 879143
Instead of concatenating ESD
and Species
into a single string,
it is better to merge on a list of columns:
result = pd.merge(df, df2, on=['ESD', 'Species'], how='left')
To keep one row in result
for each row in df
(i.e. perform a left-join), use how='left'
.
import pandas as pd
try:
# for Python2
from cStringIO import StringIO
except ImportError:
# for Python3
from io import StringIO
df = pd.read_table(StringIO('''\
ESD Species Key1
Sub POAP SubPOAP
Sub FRAP SubFRAP
Sand POAP SandPOAP
Sand ACER SandACER
Sand SALI SandSALI'''), sep='\s+')
df2 = pd.read_table(StringIO('''\
ESD Species Values Key
Sub POAP 10 SubPOAP
Sub FRAP 20 SubFRAP'''), sep='\s+')
result = pd.merge(df, df2, on=['ESD', 'Species'], how='left')
print(result)
yields
ESD Species Key1 Values Key
0 Sub POAP SubPOAP 10 SubPOAP
1 Sub FRAP SubFRAP 20 SubFRAP
2 Sand POAP SandPOAP NaN NaN
3 Sand ACER SandACER NaN NaN
4 Sand SALI SandSALI NaN NaN
Note that if df
looked like this,
ESD Species Key1
FOO BAR FOOBAR
FOOB AR FOOBAR
then two different (ESD, Species)
pairs could be mapped to the same Key1
--
i.e. FOOBAR
. Thus concatenating strings to produce a key for the purpose of
merging may lead to errors.
This may or may not affect you, but since it can in theory lead to errors, and
since you can merge on ['ESD', 'Species']
directly, it is better to avoid the
concatenation.
Upvotes: 2