Using a Python dict to replace/clean data in a Pandas DataFrame

Question

I have a Dataframe(table2) that looks something like

57                  INVERNESS
361                 INVERNESS
533                 INVERNESS
535     INVERNESS KERRY DOWNS
758           INVERNESS GREEN
807                 INVERNESS
970           INVERNESS POINT
971                 INVERNESS

And so on..

And I need to map/replace the names using a Dict, (which I have in a Excel sheet) When I read the translate table into Pandas I get a DF that looks like

             NSUBDIVISION
SUBDIVISION 
*HUFFMAN**8MILES NE  OTHER
0                    OTHER
00                   OTHER
000                  OTHER
INVERNESS POINT      INVERNESS

And so on.. When I convert it to a DICT using xlate=df.to_dict() I get a dict(xlate) that looks like:

{u'NSUBDIVISION': {u'*HUFFMAN**8MILES NE': u'OTHER',
  u'0': u'OTHER',
  u'00': u'OTHER',
  u'000': u'OTHER',
  u'0000': u'OTHER',
  u'INVERNESS POINT': u'INVERNESS',

And so ..on (I mention this as I'm not sure the dict is Properly formed)

I want to do something like

 table2['SUBDIVISION'].replace(to_replace=xlate,inplace=True)

I want to look up values in the 1st col of the xlate table match them to table2['SUBDIVISION'] and if found replace contents of SUBDIVISION with the values in xlate column 2 if not leave them alone (bonus..actually if col 2 is NAn I'd like to leave it alone as well) for instance above finding INVERNESS POINT will be replaced by INVERNESS

currently I just get TypeError: unhashable type: 'dict'

Andy Hayden · Accepted Answer

I think you want to create a dictionary from the Series (rather than the DataFrame):

In [11]: translate_df['NSUBDIVISION'].to_dict()
Out[11]:
{'*HUFFMAN**8MILES NE': 'OTHER',
 '0': 'OTHER',
 '00': 'OTHER',
 '000': 'OTHER',
 'INVERNESS POINT': 'INVERNESS'}

And use this to replace the column:

In [12]: df['SUBDIVISION'].replace(translate_df['NSUBDIVISION'].to_dict())
Out[12]:
0                INVERNESS
1                INVERNESS
2                INVERNESS
3    INVERNESS KERRY DOWNS
4          INVERNESS GREEN
5                INVERNESS
6                INVERNESS
7                INVERNESS
Name: SUBDIVISION, dtype: object

Using a Python dict to replace/clean data in a Pandas DataFrame

Answers (1)

Related Questions