Reputation: 11
I have a dataset that looks like this:
id keyPhrases
0 [word1, word2]
1 [word4, word 5 and 6, word7]
2 [word8, etc, etc
Each value in 'keyPhrases' is a list. I'd like to expand each list into a new row (string)
The 'id' column is not important right now.
Already tried df.values, from_records, etc
Expected:
keyPhrases
word1
word2
word3
word4
Upvotes: 0
Views: 72
Reputation: 21
The answer given above for the numpy library really is very good, but I participate by putting a code trellis, not performatic, but in the simplest way to understand.
import pandas as pd
lista = [[['word1', 'word2']], [['word4', 'word5', 'word6', 'word7']], [['word8', 'word9', 'word10']]]
df = pd.DataFrame(lista, columns=['keyPhrases'])
list = []
for key in df.keyPhrases:
for element in key:
list.append(element)
list
Upvotes: 1
Reputation: 46351
Found another way to do:
df['keyPhrases'] = df['keyPhrases'].str.split(',') #to make arrays
df['keyPhrases'] = df['keyPhrases'].astype(str) #back to strings
s=''.join(df.keyPhrases).replace('[','').replace(']','\n').replace(',','\n') #replace magic
print(s)
word1
word2
word4
word 5 and 6
word7
word8
etc
etc
Upvotes: 1
Reputation: 11
I am not sure about any existing functions which could do this in single line of code. The work around code below can solve your requirement. If there are any other built-in functions that can get this done without struggle, I will be glad to know.
import pandas as pd
#Existing DF where the data is in the form of list
df = pd.DataFrame(columns=['ID', 'value_list'])
#New DF where the data should be atomic
df_new = pd.DataFrame(columns=['ID', 'value_single'])
#Sample Data
row_1 = ['A', 'B', 'C', 'D']
row_2 = ['D', 'E', 'F']
row_3 = ['F', 'G']
row_4 = ['H', 'I']
row_5 = ['J']
#Data Push to existing DF
row_ = "row_"
for i in range(5):
df.loc[i, 'ID'] = i
df.loc[i, 'value_list'] = eval(row_+str(i+1))
#Data Push to new DF where list is pushed as atomic data
counter = 0
i=0
while(i<len(df)):
j=0
while(j<len(df['value_list'][i])):
df_new.loc[counter, 'ID'] = df['ID'][i]
df_new.loc[counter, 'value_single'] = df['value_list'][i][j]
counter = counter + 1
j = j+1
i = i+1
print(df_new)
This link could help with your requirement.
Upvotes: 1
Reputation: 11
Both the numpy and the itertools methods worked pretty fine.
I ended up using the itertools method and used the for to write each line to a file.
It saved me a lot of time and code.
Thanks a lot!!
for elem in itertools.chain.from_iterable(df['keyPhrases'].values):
textfile.write(elem + "\n")
Upvotes: 1
Reputation: 971
keyPhrases = df.keyPhrases.tolist()
reduce(lambda x, y: x+y, keyPhrases)
Upvotes: 1
Reputation: 323266
A fun way but not recommended
df.keyPhrases.sum()
Out[520]: ['word1', 'word2', 'word4', 'word5', 'word7', 'word8', 'word9']
Upvotes: 2
Reputation: 75080
np.concatenate(df.keyPhrases) #data courtesy vurmux
array(['word1', 'word2', 'word4', 'word5', 'word7', 'word8', 'word9'],
dtype='<U5')
Another way:
import functools
import operator
functools.reduce(operator.iadd, df.keyPhrases, [])
#['word1', 'word2', 'word4', 'word5', 'word7', 'word8', 'word9']
Upvotes: 2
Reputation: 10020
You can use itertools.chain in combination with dataframe column selection:
import itertools
df = pd.DataFrame({
'keyPhrases': [
['word1', 'word2'],
['word4', 'word5', 'word7'],
['word8', 'word9']
],
'id': [1,2,3]
})
for elem in itertools.chain.from_iterable(df['keyPhrases'].values):
print(elem)
will print:
word1
word2
word4
word5
word7
word8
word9
Upvotes: 2