Reputation: 1777
Have a dataframe df:
import pandas as pd
import numpy as np
df=pd.DataFrame(np.array([('x', 'y')] + [('y', 'x')] +
list([0, np.nan]*2)), columns=['Col'])
df
How can df be split into two columns as follows?:
Col1 Col2
0 x y
1 y x
2 0 0
3 NaN NaN
4 0 0
5 NaN NaN
Upvotes: 1
Views: 39
Reputation: 863236
Use list comprehension
with convert scalars to tuples:
df1 = pd.DataFrame([x if isinstance(x, tuple) else (x,x) for x in df['Col']],
columns=['Col1','Col2'])
print (df1)
Col1 Col2
0 x y
1 y x
2 0 0
3 NaN NaN
4 0 0
5 NaN NaN
More general solution:
lens = int(df['Col'].str.len().max())
df1 = pd.DataFrame([x if isinstance(x, tuple) else [x] * lens for x in df['Col']])
Another solution, slowier in large data:
df1 = df['Col'].apply(pd.Series).ffill(axis=1)
Performance:
df = pd.concat([df] * 1000, ignore_index=True)
In [51]: %%timeit
...: df1 = pd.DataFrame([x if isinstance(x, tuple) else (x,x) for x in df['Col']],
...: columns=['Col1','Col2'])
...:
2.42 ms ± 45.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [52]: %%timeit
...: df['Col'].apply(pd.Series).ffill(axis=1)
...:
1 s ± 23.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
#coldspeed solution
In [53]: %%timeit
...: v = pd.to_numeric(df.Col, errors='coerce')
...: pd.DataFrame({
...: 'Col1': v.fillna(df.Col.str[0]),
...: 'Col2': v.fillna(df.Col.str[-1])})
...:
15.8 ms ± 472 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Upvotes: 2
Reputation: 402854
A nice, concise solution is to use pd.to_numeric
to convert non-numeric data to NaN, and then fillna
.
v = pd.to_numeric(df.Col, errors='coerce')
pd.DataFrame({
'Col1': v.fillna(df.Col.str[0]),
'Col2': v.fillna(df.Col.str[-1])})
Col1 Col2
0 x y
1 y x
2 0 0
3 NaN NaN
4 0 0
5 NaN NaN
Solution, for multiple possible columns:
pd.DataFrame({
f'Col{i+1}': v.fillna(df.Col.str[i])
for i in range(int(df.Col.str.len().max()))})
Col1 Col2
0 x y
1 y x
2 0 0
3 NaN NaN
4 0 0
5 NaN NaN
Upvotes: 2