Reputation: 755
I have a dataframe below:
import pandas
df = pandas.DataFrame({"terms" : [[['the', 'boy', 'and', 'the goat'],['a', 'girl', 'and', 'the cat']], [['fish', 'boy', 'with', 'the dog'],['when', 'girl', 'find', 'the mouse'], ['if', 'dog', 'see', 'the cat']]]})
My desired outcome is as follows:
df2 = pandas.DataFrame({"terms" : ['the boy and the goat','a girl and the cat', 'fish boy with the dog','when girl find the mouse', 'if dog see the cat']})
Is there a simple way to accomplish this without having to use a for loop to iterate through each row for each element and substring:
result = pandas.DataFrame()
for i in range(len(df.terms.tolist())):
x = df.terms.tolist()[i]
for y in x:
z = str(y).replace(",",'').replace("'",'').replace('[','').replace(']','')
flattened = pandas.DataFrame({'flattened_term':[z]})
result = result.append(flattened)
print(result)
Thank you.
Upvotes: 2
Views: 209
Reputation: 95957
This is certainly no way to avoid loops here, at least not implicitely. Pandas is not created to handle list
objects as elements, it deals magnificently with numeric data, and pretty well with strings. In any case, your fundamental problem is that you are using pd.Dataframe.append
in a loop, which is a quadratic time algorithm (the entire data-frame is re-created on each iteration). But you can probably just get away with the following, and it should be significantly faster:
>>> df
terms
0 [[the, boy, and, the goat], [a, girl, and, the...
1 [[fish, boy, with, the dog], [when, girl, find...
>>> pandas.DataFrame([' '.join(term) for row in df.itertuples() for term in row.terms])
0
0 the boy and the goat
1 a girl and the cat
2 fish boy with the dog
3 when girl find the mouse
4 if dog see the cat
>>>
Upvotes: 3