Raviprasad M R
Raviprasad M R

Reputation: 44

Split a column and append to existing values in other columns using pandas

I am facing some hard challenge in data cleaning using pandas. I need to work on the ZIP, State and City columns in the dataframe (as attached).

This is the glimpse of the data:

    City         State        ZIP   
0   PEARLAND       TX        77584
1   HOT SPRINGS    AR        7191
2   PALMDALE       CA        93551
3   JONESBOROY     AR        72401
4   WATSONVILLE    CA        95076
5   SACRAMENTO     UCA       95823      
6   AGOURAT       HILLS      CA 91301
7   JSAN          DIEGO      CA 92114
8   NEW           PORT       RICHEY FL 34655
9   BURBANK        CA        91501

Challenge I am facing: The Zip column should contain only the Zip number, The state column should have only 2 digit state code, and city should have valid city name.

For example:

    City          State        ZIP
6   AGOURAT       HILLS      CA 91301
7   JSAN          DIEGO      CA 92114
8   NEW           PORT       RICHEY FL 34655

should be -

    City               State        ZIP
6   AGOURAT HILLS      CA         91301
7   JSAN DIEGO         CA         92114
8   NEW PORT RICHEY    FL         34655

So can anyone help me how can I achieve this using Pandas dataframe?

Upvotes: 1

Views: 254

Answers (1)

Vaishali
Vaishali

Reputation: 38415

The best option would be to read the data once again using regex. If not, redo.

df['combined'] = df.apply(' '.join, axis = 1)
df[['City', 'State', 'Zip']] = df['combined'].str.extract('([A-Z ]+)\s+([A-Z]{2})\s+(\d+)', expand = True)
df.drop('combined', axis = 1, inplace = True)



    City            State   ZIP
0   PEARLAND        TX  77584
1   HOT SPRINGS     AR  7191
2   PALMDALE        CA  93551
3   JONESBOROY      AR  72401
4   WATSONVILLE     CA  95076
5   NaN             NaN NaN
6   AGOURAT HILLS   CA  91301
7   JSAN DIEGO      CA  92114
8   NEW PORT RICHEY FL  34655
9   BURBANK         CA  91501

Upvotes: 1

Related Questions