Viktor Avdulov
Viktor Avdulov

Reputation: 157

Add space before capital letters in a dataframe or column in python using regex

numbered table of provinces in Afghanistan, columns are Province, Centers, and U.N. Region

I need to have the values in the columns split up where the capital letters are. So it looks like this:

West Afghanistan or North East Afghanistan

I tried this so far and nothing changes. I would prefer to not go through every column. Is this possible to do without the for loop, possibly using apply_all or lambda, or a combination of the two?

afg_regions['U.N. Region'].replace(('[A-z]','[A-z]*(\s)[A-z]*'),regex=True,inplace=True)

Upvotes: 0

Views: 2867

Answers (3)

kantal
kantal

Reputation: 2407

Yet another solution:

df.apply(lambda col: col.str.replace(r"([a-z])([A-Z])",r"\1 \2"))  

Out: 
              U.N. Region   Centers
0  North East Afghanistan  Fayzabad
1        West Afghanistan  Qala Naw

Upvotes: 1

jezrael
jezrael

Reputation: 863166

Use Series.str.replace with replace uppercase by same vales with space before and then remove first space:

df = pd.DataFrame({'U.N.Region':['WestAfghanistan','NorthEastAfghanistan']})

df['U.N.Region'] = df['U.N.Region'].str.replace( r"([A-Z])", r" \1").str.strip()
print (df)
                U.N.Region
0         West Afghanistan
1   North East Afghanistan

Upvotes: 5

Emma
Emma

Reputation: 27743

Another option would be,

import pandas as pd
import re


df = pd.DataFrame({'U.N.Region': ['WestAfghanistan', 'NorthEastAfghanistan']})

df['U.N.Region'] = df['U.N.Region'].str.replace(
    r"(?<=[a-z])(?=[A-Z])", " ")
print(df)

Upvotes: 1

Related Questions