Glenn G.
Glenn G.

Reputation: 419

Changing column value by finding substring in string values

I'm trying to change the values in a single column using pandas apply(). My function partially worked, but I'm stuck on how to fix this other half.

Data Column:

County Name Riverside County San Diego County SanFrancisco County/city

I'm trying to get rid of " County" so I'm left with just the name of the county. I successfully got rid of the " County" with the function but I'm having trouble removing the " County/city" from San Francisco.

Code:

def modify_county(countyname):
  if "/city" in countyname:
    return countyname.replace(" County/city","")
  return countyname.replace(" County","")

lfd["CountyName"] = lfd["CountyName"].apply(modify_county)

Output:

CountyName
Riverside
San Diego
San Francisco County/city

Is something wrong with the conditional in the function?

Upvotes: 0

Views: 54

Answers (3)

Glenn G.
Glenn G.

Reputation: 419

@jpp, used suggestion to apply to entire column. Not sure if this is the best way, but it worked.

lfd["CountyName"] = pd.Series(lfd["CountyName"])

lfd["CountyName"] = lfd["CountyName"].apply(lambda x: ' '.join([w for w in x.split() if not 'County' in w]))

Upvotes: 0

WillMonge
WillMonge

Reputation: 1035

@jpp's answer is the literal way of doing what you asked for. But in this case I would use pandas.Series.replace and some regex to substitute the entire thing in one go:

import pandas as pd

s = pd.Series(['Riverside County', 'San Diego County', 'SanFrancisco County/city'])

res = s.replace(' County(/city)?', '', regex=True)

Upvotes: 1

jpp
jpp

Reputation: 164663

This is an alternative way. It works with the data you have provided.

import pandas as pd

s = pd.Series(['Riverside County', 'San Diego County', 'SanFrancisco County/city'])

res = s.apply(lambda x: ' '.join([w for w in x.split() if not 'County' in w]))

print(res)

# 0       Riverside
# 1       San Diego
# 2    SanFrancisco
# dtype: object

Upvotes: 1

Related Questions