Ruzannah
Ruzannah

Reputation: 131

split & concat string from dataframe

From dataframe, I want to split from (col1) the number before first symbol | into a list, the second number after that into b list and string from(col1), (text1), (text2), (text3) into text list

col1       | text1     | text2           | text3
1|6|Show   | us the    | straight way    | null
109|2|I    | worship   | not that        | which ye worship

the output that I expected

a = [1, 109] b = [6, 2] text = [‘Show us the straight way’, ‘I worship not that which ye worship’]

what is the best way to do this?

Upvotes: 0

Views: 570

Answers (1)

cs95
cs95

Reputation: 402433

This is straightforward assuming col1 has 3 pipe-separated elements throughout.

a,b,C = zip(*df.col1.str.split('|'))
D =  df.drop('col1', 1).agg(lambda x: ' '.join(x.dropna()), axis=1)

c = [c + ' ' + d for c,d in zip(c,D)]

print(a)
('1', '109')

print(b)
('6', '2')

print(c)
['Show us the straight way', 'I worship not that which ye worship']

Note that a and b is a collection of strings, you can map them to numeric with

a, b = map(pd.to_numeric, (a,b))

...to get arrays of integers.


To handle the generic case of col1 having any number of values, you will need to

v = df.col1.str.split('|', expand=True)
m = v.applymap(str.isdigit)
a,b,*_ = v[m].T.agg(lambda x: x.dropna().tolist(), axis=1)

print(a)
['1', '109']

print(b)
['6', '2']

C can be computed similarly:

C = v[~m].agg(lambda x: x.dropna().str.cat(sep=' '), axis=1).tolist()

and then small c can be computed as before.

Upvotes: 1

Related Questions