Reputation: 23
I have a pandas
dataframe that looks like:
flag
0 NNxxNxNNxNN
1 xxNNNNNNNNN
2 xxxNNxNNNNN
3 xxxxNxxxxxN
4 xxxxxxNxxxx
5 xxxxxxxNxNN
And I would like to split the string into a new column for each character, for example like this:
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 col11
0 N N x x N x N N x N N
1 x x N N N N N N N N N
2 x x x N N x N N N N N
3 x x x x N x x x x x N
4 x x x x x x N x x x x
5 x x x x x x x N x N N
My dataframe has several million rows - is there an efficient way to do this?
Upvotes: 2
Views: 1467
Reputation: 323226
Using tolist
with pd.DataFrame
pd.DataFrame(df.flag.apply(list).tolist())
Out[905]:
0 1 2 3 4 5 6 7 8 9 10
0 N N x x N x N N x N N
1 x x N N N N N N N N N
2 x x x N N x N N N N N
3 x x x x N x x x x x N
4 x x x x x x N x x x x
5 x x x x x x x N x N N
And method from extractall
df.flag.str.extractall('(.)')[0].unstack()
Out[931]:
match 0 1 2 3 4 5 6 7 8 9 10
0 N N x x N x N N x N N
1 x x N N N N N N N N N
2 x x x N N x N N N N N
3 x x x x N x x x x x N
4 x x x x x x N x x x x
5 x x x x x x x N x N N
Upvotes: 1
Reputation: 51335
You can do:
new_df = pd.DataFrame(np.stack(df.flag.apply(list).values))
>>> new_df
0 1 2 3 4 5 6 7 8 9 10
0 N N x x N x N N x N N
1 x x N N N N N N N N N
2 x x x N N x N N N N N
3 x x x x N x x x x x N
4 x x x x x x N x x x x
5 x x x x x x x N x N N
Or
new_df = df.flag.apply(lambda x: pd.Series(list(x)))
>>> new_df
0 1 2 3 4 5 6 7 8 9 10
0 N N x x N x N N x N N
1 x x N N N N N N N N N
2 x x x N N x N N N N N
3 x x x x N x x x x x N
4 x x x x x x N x x x x
5 x x x x x x x N x N N
To get your column names, just add add_prefix
to either of the calls above:
new_df = df.flag.apply(lambda x: pd.Series(list(x))).add_prefix('col_')
>>> new_df
col_0 col_1 col_2 col_3 col_4 col_5 col_6 col_7 col_8 col_9 col_10
0 N N x x N x N N x N N
1 x x N N N N N N N N N
2 x x x N N x N N N N N
3 x x x x N x x x x x N
4 x x x x x x N x x x x
5 x x x x x x x N x N N
Upvotes: 1