Reputation: 93
I have a pandas data frame with one of its column containing some string. I want to split that column into an unknown number of columns according to word count.
Suppose, I have DataFrame df
:
Index Text
0 He codes
1 He codes well in python
2 Python is great language
3 Pandas package is very handy
Now I want to divide the text column into multiple columns, each containing 2 words each.
Index 0 1 2
0 He codes NaN NaN
1 He codes well in python
2 Python is great language NaN
3 Pandas package is very handy
How can I do this in python? Please help. Thanks in advance.
Upvotes: 4
Views: 3681
Reputation: 23099
IIUC, we can str.split
groupby
cumcount
with floor division and unstack
s = (
df["Text"]
.str.split("\s", expand=True)
.stack()
.to_frame("words")
.reset_index(1, drop=True)
)
s["count"] = s.groupby(level=0).cumcount() // 2
final = s.rename_axis("idx").groupby(["idx", "count"])["words"].agg(" ".join).unstack(1)
print(final)
count 0 1 2
idx
0 He codes NaN NaN
1 He codes well in python
2 Python is great language NaN
3 Pandas package is very handy
Upvotes: 2
Reputation: 1213
Given a dataframe df
where in the Text
column we have sentences that need to be split by two words:
import pandas as pd
def splitter(s):
spl = s.split()
return [" ".join(spl[i:i+2]) for i in range(0, len(spl), 2)]
df_new = pd.DataFrame(df["Text"].apply(splitter).to_list())
# 0 1 2
# 0 He codes well None
# 1 He codes well in Python
Upvotes: 7