user1700890
user1700890

Reputation: 7732

Pandas manipulating strings in two columns

Here is my dataframe:

           A                B
0   asdf|afsdf|fasd    sdsd|wer|wer
1   sdfsdf             sdfsdff
2   sdf|s              sdfsde|sdf

I would like to form column C which would contain concatenated values from column A and column B up to first |, and if | is not present, then it will simply concatenate two columns. Also while concatenating I would like to insert -- here is how column C should look like:

         C
0   asdf--sdsd
1   sdfsdf--sdfsdff
2   sdf--sdfsde

I can go through every single row with df.loc and get what I need, but it is slow and I wonder if there is faster way of doing it.

Upvotes: 4

Views: 158

Answers (2)

piRSquared
piRSquared

Reputation: 294488

Short answer using str and split

df['C'] = df.A.str.split('|').str.get(0).add('--') \
        + df.B.str.split('|').str.get(0)
df

enter image description here


You can expand this out further

df['C'] = df.A.str.split('|', expand=True).stack() \
    .add('--').add(df.B.str.split('|', expand=True).stack()) \
    .groupby(level=0).apply('|'.join)
df

enter image description here

Upvotes: 3

Michael Gecht
Michael Gecht

Reputation: 1444

There may be a more fancy way, but this works:

In [1]: import pandas as pd

In [2]: d = {'A': ('asdf|afsdf|fasd', 'sdfsdf', 'sdf|s'),
             'B': ('sdsd|wer|wer', 'sdfsdff', 'sdfsde|sdf')}

In [3]: data = pd.DataFrame(d)

In [4]: data['C'] = data['A'].str.split('|').str.get(0) + "--" + data['B'].str.split('|').str.get(0)

In [5]: data
Out[5]: 
                 A             B                 C
0  asdf|afsdf|fasd  sdsd|wer|wer       asdf--sdsd
1           sdfsdf       sdfsdff  sdfsdf--sdfsdff
2            sdf|s    sdfsde|sdf      sdf--sdfsde

References on working with strings in pandas can be found in their documentation.

Upvotes: 5

Related Questions