Reputation: 7732
Here is my dataframe:
A B
0 asdf|afsdf|fasd sdsd|wer|wer
1 sdfsdf sdfsdff
2 sdf|s sdfsde|sdf
I would like to form column C
which would contain concatenated values from column A
and column B
up to first |
, and if |
is not present, then it will simply concatenate two columns. Also while concatenating I would like to insert --
here is how column C
should look like:
C
0 asdf--sdsd
1 sdfsdf--sdfsdff
2 sdf--sdfsde
I can go through every single row with df.loc
and get what I need, but it is slow and I wonder if there is faster way of doing it.
Upvotes: 4
Views: 158
Reputation: 294488
Short answer using str
and split
df['C'] = df.A.str.split('|').str.get(0).add('--') \
+ df.B.str.split('|').str.get(0)
df
You can expand this out further
df['C'] = df.A.str.split('|', expand=True).stack() \
.add('--').add(df.B.str.split('|', expand=True).stack()) \
.groupby(level=0).apply('|'.join)
df
Upvotes: 3
Reputation: 1444
There may be a more fancy way, but this works:
In [1]: import pandas as pd
In [2]: d = {'A': ('asdf|afsdf|fasd', 'sdfsdf', 'sdf|s'),
'B': ('sdsd|wer|wer', 'sdfsdff', 'sdfsde|sdf')}
In [3]: data = pd.DataFrame(d)
In [4]: data['C'] = data['A'].str.split('|').str.get(0) + "--" + data['B'].str.split('|').str.get(0)
In [5]: data
Out[5]:
A B C
0 asdf|afsdf|fasd sdsd|wer|wer asdf--sdsd
1 sdfsdf sdfsdff sdfsdf--sdfsdff
2 sdf|s sdfsde|sdf sdf--sdfsde
References on working with strings in pandas
can be found in their documentation.
Upvotes: 5