Oli
Oli

Reputation: 23

how to split a string in one column into new columns for each character in pandas

I have a pandas dataframe that looks like:

                flag
0        NNxxNxNNxNN
1        xxNNNNNNNNN
2        xxxNNxNNNNN
3        xxxxNxxxxxN
4        xxxxxxNxxxx
5        xxxxxxxNxNN

And I would like to split the string into a new column for each character, for example like this:

         col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 col11
0        N    N    x    x    N    x    N    N    x    N    N
1        x    x    N    N    N    N    N    N    N    N    N
2        x    x    x    N    N    x    N    N    N    N    N
3        x    x    x    x    N    x    x    x    x    x    N
4        x    x    x    x    x    x    N    x    x    x    x
5        x    x    x    x    x    x    x    N    x    N    N

My dataframe has several million rows - is there an efficient way to do this?

Upvotes: 2

Views: 1467

Answers (2)

BENY
BENY

Reputation: 323226

Using tolist with pd.DataFrame

pd.DataFrame(df.flag.apply(list).tolist())
Out[905]: 
  0  1  2  3  4  5  6  7  8  9  10
0  N  N  x  x  N  x  N  N  x  N  N
1  x  x  N  N  N  N  N  N  N  N  N
2  x  x  x  N  N  x  N  N  N  N  N
3  x  x  x  x  N  x  x  x  x  x  N
4  x  x  x  x  x  x  N  x  x  x  x
5  x  x  x  x  x  x  x  N  x  N  N

And method from extractall

df.flag.str.extractall('(.)')[0].unstack()
Out[931]: 
match 0  1  2  3  4  5  6  7  8  9  10
0      N  N  x  x  N  x  N  N  x  N  N
1      x  x  N  N  N  N  N  N  N  N  N
2      x  x  x  N  N  x  N  N  N  N  N
3      x  x  x  x  N  x  x  x  x  x  N
4      x  x  x  x  x  x  N  x  x  x  x
5      x  x  x  x  x  x  x  N  x  N  N

Upvotes: 1

sacuL
sacuL

Reputation: 51335

You can do:

new_df = pd.DataFrame(np.stack(df.flag.apply(list).values))
>>> new_df
  0  1  2  3  4  5  6  7  8  9  10
0  N  N  x  x  N  x  N  N  x  N  N
1  x  x  N  N  N  N  N  N  N  N  N
2  x  x  x  N  N  x  N  N  N  N  N
3  x  x  x  x  N  x  x  x  x  x  N
4  x  x  x  x  x  x  N  x  x  x  x
5  x  x  x  x  x  x  x  N  x  N  N

Or

new_df = df.flag.apply(lambda x: pd.Series(list(x)))
>>> new_df
  0  1  2  3  4  5  6  7  8  9  10
0  N  N  x  x  N  x  N  N  x  N  N
1  x  x  N  N  N  N  N  N  N  N  N
2  x  x  x  N  N  x  N  N  N  N  N
3  x  x  x  x  N  x  x  x  x  x  N
4  x  x  x  x  x  x  N  x  x  x  x
5  x  x  x  x  x  x  x  N  x  N  N

To get your column names, just add add_prefix to either of the calls above:

new_df = df.flag.apply(lambda x: pd.Series(list(x))).add_prefix('col_')
>>> new_df
  col_0 col_1 col_2 col_3 col_4 col_5 col_6 col_7 col_8 col_9 col_10
0     N     N     x     x     N     x     N     N     x     N      N
1     x     x     N     N     N     N     N     N     N     N      N
2     x     x     x     N     N     x     N     N     N     N      N
3     x     x     x     x     N     x     x     x     x     x      N
4     x     x     x     x     x     x     N     x     x     x      x
5     x     x     x     x     x     x     x     N     x     N      N

Upvotes: 1

Related Questions