Reputation: 131
From dataframe,
I want to split from (col1) the number before first symbol |
into a
list, the second number after that into b
list and string from(col1), (text1), (text2), (text3) into text
list
col1 | text1 | text2 | text3
1|6|Show | us the | straight way | null
109|2|I | worship | not that | which ye worship
the output that I expected
a = [1, 109]
b = [6, 2]
text = [‘Show us the straight way’, ‘I worship not that which ye worship’]
what is the best way to do this?
Upvotes: 0
Views: 570
Reputation: 402433
This is straightforward assuming col1 has 3 pipe-separated elements throughout.
a,b,C = zip(*df.col1.str.split('|'))
D = df.drop('col1', 1).agg(lambda x: ' '.join(x.dropna()), axis=1)
c = [c + ' ' + d for c,d in zip(c,D)]
print(a)
('1', '109')
print(b)
('6', '2')
print(c)
['Show us the straight way', 'I worship not that which ye worship']
Note that a
and b
is a collection of strings, you can map them to numeric with
a, b = map(pd.to_numeric, (a,b))
...to get arrays of integers.
To handle the generic case of col1 having any number of values, you will need to
v = df.col1.str.split('|', expand=True)
m = v.applymap(str.isdigit)
a,b,*_ = v[m].T.agg(lambda x: x.dropna().tolist(), axis=1)
print(a)
['1', '109']
print(b)
['6', '2']
C
can be computed similarly:
C = v[~m].agg(lambda x: x.dropna().str.cat(sep=' '), axis=1).tolist()
and then small c
can be computed as before.
Upvotes: 1