Manuel Weinkauf
Manuel Weinkauf

Reputation: 350

How to use pd.DataFrame.replace on a column containing lists

I am having trouble with pandas replace-function. Let's say we have an example dataframe like this:

df = pd.DataFrame({'State': ['Georgia', 'Alabama', 'Tennessee'],
                   'Cities': [['Atlanta', 'Albany'], ['Montgomery', 'Huntsville', 'Birmingham'], ['Nashville', 'Knoxville']]})

>>> df
       State                                Cities
0    Georgia                     [Atlanta, Albany]
1    Alabama  [Montgomery, Huntsville, Birmingham]
2  Tennessee                [Nashville, Knoxville]

Now I want to replace the state names and city names all by abbreviations. I have two dictionaries that define the replacement values:

state_abbrv = {'Alabama': 'AL', 'Georgia': 'GA', 'Tennessee': 'TN'}
city_abbrv = {'Albany': 'Alb.', 'Atlanta': 'Atl.', 'Birmingham': 'Birm.',
              'Huntsville': 'Htsv.', 'Knoxville': 'Kxv.',
              'Montgomery': 'Mont.', 'Nashville': 'Nhv.'}

When using pd.DataFrame.replace() on the "States" column (which only contains one value per row) it works as expected and replaces all state names:

>>> df.replace({'State': state_abbrv})

  State                                Cities
0    GA                     [Atlanta, Albany]
1    AL  [Montgomery, Huntsville, Birmingham]
2    TN                [Nashville, Knoxville]

I was hoping that it would also individually replace all matching names within the lists of the "Cities" column, but unfortunately it does not seem to work as all cities remain unabbreviated:

>>> df.replace({'Cities': city_abbrv})
      State                                Cities
0    Georgia                     [Atlanta, Albany]
1    Alabama  [Montgomery, Huntsville, Birmingham]
2  Tennessee                [Nashville, Knoxville]

How do I get the pd.DataFrame.replace() function to individually circle through all list elements in the column per row and replace accordingly?

Upvotes: 0

Views: 91

Answers (1)

not_speshal
not_speshal

Reputation: 23156

Try:

  1. explode to split the list into individual rows
  2. replace each column using the relevant dictionary
  3. groupby and agg to get back the original structure
>>> output = df.explode("Cities").replace({"State": state_abbrv, "Cities": city_abbrv}).groupby("State", as_index=False)["Cities"].agg(list)
  State                 Cities
0    AL  [Mont., Htsv., Birm.]
1    GA           [Atl., Alb.]
2    TN           [Nhv., Kxv.]

Upvotes: 1

Related Questions