Avinash Clinton
Avinash Clinton

Reputation: 543

split column values into one to one mapping

An extension to following question : Split (explode) pandas dataframe string entry to separate rows

df :

    STATE CITY  ALT_NAMES
0   S1    C1    A1@A2
1   S2    C2    A3@A4@A5 

How can I achieve following result:

out_df :

    STATE  CITY  CITY_VAR
0   S1     C1    A1
1   S1     C1    A2
2   S2     C2    A3
3   S2     C2    A4
4   S2     C2    A5

sample data :

    STATE CITY            ALT_NAMES
    FL    FT. MYERS       FORT MYERS@FT MYERS
    FL    NORTH FT MYERS  N.FT.MYERS@N. FORT MYERS@NORTH FORT MYERS

Upvotes: 1

Views: 35

Answers (2)

jezrael
jezrael

Reputation: 862611

For me working:

df = explode(df.assign(ALT_NAMES=df.ALT_NAMES.str.split('@')), 'ALT_NAMES')
print (df)
  STATE CITY ALT_NAMES
0    S1   C1        A1
1    S1   C1        A2
2    S2   C2        A3
3    S2   C2        A4
4    S2   C2        A5

Another pure pandas solution:

df = (df.join(df.pop('ALT_NAMES')
                .str.split('@', expand=True)
                .stack()
                .reset_index(level=1, drop=True)
                .rename('ALT_NAMES'))
        .reset_index(drop=True ))
print (df)
  STATE            CITY         ALT_NAMES
0    FL       FT. MYERS        FORT MYERS
1    FL       FT. MYERS          FT MYERS
2    FL  NORTH FT MYERS        N.FT.MYERS
3    FL  NORTH FT MYERS     N. FORT MYERS
4    FL  NORTH FT MYERS  NORTH FORT MYERS

Upvotes: 2

cs95
cs95

Reputation: 402483

This is an optimised version for your data.

from itertools import chain
v = df.pop('ALT_NAMES').str.split('@')  

df = pd.DataFrame(
    df.values.repeat(v.str.len(), axis=0), columns=df.columns)
df['ALT_NAMES'] = list(chain.from_iterable(v))

df
  STATE CITY ALT_NAMES
0    S1   C1        A1
1    S1   C1        A2
2    S2   C2        A3
3    S2   C2        A4
4    S2   C2        A5

Upvotes: 1

Related Questions