Pandas manipulating strings in two columns

Question

Here is my dataframe:

           A                B
0   asdf|afsdf|fasd    sdsd|wer|wer
1   sdfsdf             sdfsdff
2   sdf|s              sdfsde|sdf

I would like to form column C which would contain concatenated values from column A and column B up to first |, and if | is not present, then it will simply concatenate two columns. Also while concatenating I would like to insert -- here is how column C should look like:

         C
0   asdf--sdsd
1   sdfsdf--sdfsdff
2   sdf--sdfsde

I can go through every single row with df.loc and get what I need, but it is slow and I wonder if there is faster way of doing it.

piRSquared · Accepted Answer

Short answer using str and split

df['C'] = df.A.str.split('|').str.get(0).add('--') \
        + df.B.str.split('|').str.get(0)
df

You can expand this out further

df['C'] = df.A.str.split('|', expand=True).stack() \
    .add('--').add(df.B.str.split('|', expand=True).stack()) \
    .groupby(level=0).apply('|'.join)
df

Pandas manipulating strings in two columns

Answers (2)

Related Questions