P A N
P A N

Reputation: 5922

Pandas map to DataFrame from dictionary

I am currently mapping and renaming various string values to columns in pandas via this function:

df["fundbenchmark"] = df["name"].map(lambda x: "American Express" if "AXP" in x else "Apple" if "AAPL" in x else "Google" if "GOOG" in x else "")

I will however do this for several columns and for many different renamings. Here is another column with identical content, so it would be repetitive to add all companies there also.

df["subclass"] = df["name"].map(lambda x: "American Express" if "AXP" in x else "Apple" if "AAPL" in x else "Google" if "GOOG" in x else "")

As such I would like to maintain a dictionary like:

companies = {"AXP": "American Express", "AAPL": "Apple", "GOOG": "Google"}

and call that for all instances when I need to map company names. How can I make it mapfor matches in companies instead of lambda x?


Current DataFrame:

Name              
"BULL AXP UN X3 VON"
"BEAR AXP UN X3 VON"
"BULL GOOG UN X5 VON"
"BEAR GOOG UN X5 VON"
"BEAR ABC123 X2 CBZ"

Desired output:

Name                    Fundbenchmark             Subclass
"BULL AXP UN X3 VON"      "American Express"      "American Express"
"BEAR AXP UN X3 VON"      "American Express"      "American Express"
"BULL GOOG UN X5 VON"     "Google"                "Google"
"BEAR GOOG UN X5 VON"     "Google"                "Google"
"BEAR ABC123 X2 CBZ"      "BEAR ABC123 X2 CBZ"    "BEAR ABC123 X2 CBZ" #Not in Dictionary

Dictionary:

companies = {"AXP": "American Express", "GOOG": "Google"} 

So if abbreviation exists in dictionary, then write that name to other columns.

Or, if the abbreviation is not in the dictionary, duplicate the whole cell.

Upvotes: 1

Views: 7558

Answers (3)

Moondra
Moondra

Reputation: 4511

You can use map

Current DataFrame:

Name              
"BULL AXP UN X3 VON"
"BEAR AXP UN X3 VON"
"BULL GOOG UN X5 VON"
"BEAR GOOG UN X5 VON"
"BEAR ABC123 X2 CBZ"


companies = {"AXP": "American Express", "GOOG": "Google"} 

we create a new column that extract the tickers from your column.

df['Tickers'] = df.Name.str.split(' ').apply(lambda x: x[1])

We then use map your dictionary companies to the tickers to create a column with the names of the ticker symbols:

df['Ticker_Name'] = df['Tickers'].map(companies)

Upvotes: 2

JohnE
JohnE

Reputation: 30404

I doubt this is the most elegant way, but it should do the trick:

df['fbm'] = df['name']
for i in companies:
    df.loc[ df.name.str.contains(i), 'fbm' ] = companies[i]

                  name                 fbm
0   BULL AXP UN X3 VON    American Express
1   BEAR AXP UN X3 VON    American Express
2  BULL GOOG UN X5 VON              Google
3  BEAR GOOG UN X5 VON              Google
4   BEAR ABC123 X2 CBZ  BEAR ABC123 X2 CBZ

One thing to keep in mind here is that because this is not a dictionary lookup, you could have more than one match. For example, 'ABC' and 'UN' are both valid tickers and 'BEAR' is or was. With this method, the last match will be kept and any prior matches discarded.

Upvotes: 1

Liam Foley
Liam Foley

Reputation: 7822

Use Replace:

http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.replace.html

 df = pd.DataFrame({'A':['string1','string2','string3'],
                 'B':['StringA','StringB','StringC']})

Creates:

             A        B
    0  string1  StringA
    1  string2  StringB
    2  string3  StringC

Then map your replacements in a dictionary:

to_replace = {'string1':'replace1','StringC':'replaceC'}

Then replace:

 df.replace(to_replace)
          A         B
0  replace1   StringA
1   string2   StringB
2   string3  replaceC

Upvotes: 1

Related Questions