Reputation: 1608
I have a data frame with a single column 'data' which contains words separated by space. I want to separate the data into multiple rows split by space. I have tried the following code but does not work:
from itertools import chain
def chainer(s):
return list(chain.from_iterable(s.str.split('\s+')))
lengths = df['data'].str.split('\s+').map(len)
df_m = pd.DataFrame({"data" : np.repeat(df["data"], lengths)})
Dataframe example
words = ["a b c d e","b m g f e","c" ,"w"]
dff = pd.DataFrame({"data" :words })
data
0 a b c d e
1 b m g f e
2 c
3 w
Upvotes: 0
Views: 272
Reputation: 1839
Below is my attempt.
words = ['oneword','word1 word2 word3', 'hey there hello word', 'stackoverflow is amazing']
# make list of list and flatten.
flat_list = [item for sublist in words for item in sublist.split(' ')]
# put flat_list into DataFrame.
df = pd.DataFrame({"data" :flat_list })
print(df)
data
0 oneword
1 word1
2 word2
3 word3
4 hey
5 there
6 hello
7 word
8 stackoverflow
9 is
10 amazing
Upvotes: 1
Reputation: 6639
Are you looking for something like this:
df = pd.DataFrame()
df['text'] = ['word1 word2 word3', 'hey there hello word', 'stackoverflow is amazing']
Input:
text
0 word1 word2 word3
1 hey there hello word
2 stackoverflow is amazing
Do:
x = df.data.str.split(expand=True).stack().values
new_df = pd.DataFrame()
new_df['words'] = x.tolist()
Output:
words
0 word1
1 word2
2 word3
3 hey
4 there
5 hello
6 word
7 stackoverflow
8 is
9 amazing
Upvotes: 3