ASH
ASH

Reputation: 20342

Can we merge two dataframes based on text matches?

I have one DF that looks like this:

    ID  Rating  Bin Price
0   864890  AA+ 4   97.14
1   691634  AA+ 4   14.21
2   792845  AA+ 3   101.25
3   506251  SP  3   100.31
4   689977  AA+ 3   97.37
    ... ... ... ...
249995  873393  AA+ 5   110.42
249996  495709  AA+ 7   105.47
249997  508123  AA+ 7   104.55
249998  650062  AA+ 8   105.37
249999  17658   AA+ 8   103.53

I have another DF that looks like this.

    Rating  RatingScores
0   AAA 10
1   AA+ 9.5
2   AA  9
3   A+  8.5
4   A   8
..  ... ...
20  CC- 0
21  D   0
22  NA  0
23  NR  0
24  SP  0

I am wondering if there is a way to merge the second DF into the first. The indexes will NOT match, but the Rating fields are the same in both. Or, is there an easy way to add a column to the first DF, and generate a numeric result (RatingScores) for a non-numeric label (Rating)?

So far, I have tried this:

dataset['RatingScores'] = pd.merge(dataset, finalDF, on='Rating')

I'm getting this now.

dataset['RatingScores'] = pd.merge(dataset, finalDF, on='Rating')
Traceback (most recent call last):

  File "<ipython-input-327-0afd66ad6da1>", line 1, in <module>
    dataset['RatingScores'] = pd.merge(dataset, finalDF, on='Rating')

  File "C:\Users\rshuell\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py", line 3472, in __setitem__
    self._set_item(key, value)

  File "C:\Users\rshuell\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py", line 3550, in _set_item
    NDFrame._set_item(self, key, value)

  File "C:\Users\rshuell\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py", line 3381, in _set_item
    self._data.set(key, value)

  File "C:\Users\rshuell\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals\managers.py", line 1072, in set
    self.insert(len(self.items), item, value)

  File "C:\Users\rshuell\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals\managers.py", line 1181, in insert
    block = make_block(values=value, ndim=self.ndim, placement=slice(loc, loc + 1))

  File "C:\Users\rshuell\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals\blocks.py", line 3267, in make_block
    return klass(values, ndim=ndim, placement=placement)

  File "C:\Users\rshuell\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals\blocks.py", line 2775, in __init__
    super().__init__(values, ndim=ndim, placement=placement)

  File "C:\Users\rshuell\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals\blocks.py", line 128, in __init__
    "{mgr}".format(val=len(self.values), mgr=len(self.mgr_locs))

ValueError: Wrong number of items passed 12, placement implies 1

So...

(dataset.to_dict())

  997: 5,
  998: 4,
  999: 5,
  1000: 3,
  1001: 3,
  1002: 4,
  1003: 5,
  1004: 5,
  1005: 1,
  1006: 3,
  1007: 4,
  1008: 4,
  1009: 1,
  1010: 9,
  1011: 1,
  1012: 6,
  1013: 1,
  1014: 2,
  1015: 3,
  1016: 1,
  1017: 10,
  1018: 1,
  1019: 3,
  ...}}

Also...

(finalDF.to_dict())

{'Rating': {0: 'AAA',
  1: 'AA+',
  2: 'AA',
  3: 'A+',
  4: 'A',
  5: 'A-',
  6: 'AA-',
  7: 'BBB',
  8: 'BB+',
  9: 'BB',
  10: 'B+',
  11: 'B',
  12: 'B-',
  13: 'BB-',
  14: 'CCC',
  15: 'CC+',
  16: 'CC',
  17: 'C+',
  18: 'C',
  19: 'C-',
  20: 'CC-',
  21: 'D',
  22: 'NA',
  23: 'NR',
  24: 'SP'},
 'RatingScores': {0: 10.0,
  1: 9.5,
  2: 9.0,
  3: 8.5,
  4: 8.0,
  5: 7.5,
  6: 7.0,
  7: 6.5,
  8: 6.0,
  9: 5.5,
  10: 5.0,
  11: 4.5,
  12: 4.0,
  13: 3.5,
  14: 3.0,
  15: 2.5,
  16: 2.0,
  17: 1.5,
  18: 1.0,
  19: 0.0,
  20: 0.0,
  21: 0.0,
  22: 0.0,
  23: 0.0,
  24: 0.0}}

Upvotes: 0

Views: 150

Answers (2)

sunith kumar
sunith kumar

Reputation: 3

I would look at isolating the columns that are needed in the second dataframe and then using "join" to combine both dataframes.

dat1.join(dat2)

Upvotes: 0

Darren Christopher
Darren Christopher

Reputation: 4811

Your code is already correct, but pd.merge() returns a new joined DataFrame and you put that into a pd.Series, hence ValueError.

So, what you can do is putting it into a new variable to hold a merged DataFrame

merged_df = pd.merge(dataset, finalDF, on='Rating')

Upvotes: 2

Related Questions