jaymac
jaymac

Reputation: 11

Column lists into string

I have a dataset that looks like this:


id  keyPhrases
0   [word1, word2]
1   [word4, word 5 and 6, word7]
2   [word8, etc, etc

Each value in 'keyPhrases' is a list. I'd like to expand each list into a new row (string)

The 'id' column is not important right now.

Already tried df.values, from_records, etc

Expected:


keyPhrases
word1
word2
word3
word4

Upvotes: 0

Views: 72

Answers (8)

RonaldoMoura
RonaldoMoura

Reputation: 21

The answer given above for the numpy library really is very good, but I participate by putting a code trellis, not performatic, but in the simplest way to understand.

import pandas as pd

lista = [[['word1', 'word2']], [['word4', 'word5', 'word6', 'word7']], [['word8', 'word9', 'word10']]]
df = pd.DataFrame(lista, columns=['keyPhrases'])

list = []
for key in df.keyPhrases:
    for element in key:
        list.append(element)
list

Upvotes: 1

prosti
prosti

Reputation: 46351

Found another way to do:

df['keyPhrases'] = df['keyPhrases'].str.split(',') #to make arrays
df['keyPhrases'] = df['keyPhrases'].astype(str) #back to strings
s=''.join(df.keyPhrases).replace('[','').replace(']','\n').replace(',','\n') #replace magic
print(s)

word1
 word2
word4
 word 5 and 6
 word7
word8
 etc
 etc

Upvotes: 1

I am not sure about any existing functions which could do this in single line of code. The work around code below can solve your requirement. If there are any other built-in functions that can get this done without struggle, I will be glad to know.

import pandas as pd

#Existing DF where the data is in the form of list
df = pd.DataFrame(columns=['ID', 'value_list'])
#New DF where the data should be atomic
df_new = pd.DataFrame(columns=['ID', 'value_single'])

#Sample Data
row_1 = ['A', 'B', 'C', 'D']
row_2 = ['D', 'E', 'F']
row_3 = ['F', 'G']
row_4 = ['H', 'I']
row_5 = ['J']

#Data Push to existing DF
row_ = "row_"
for i in range(5):
    df.loc[i, 'ID'] = i
    df.loc[i, 'value_list'] = eval(row_+str(i+1))

#Data Push to new DF where list is pushed as atomic data
counter = 0
i=0
while(i<len(df)):
    j=0
    while(j<len(df['value_list'][i])):
        df_new.loc[counter, 'ID'] = df['ID'][i]
        df_new.loc[counter, 'value_single'] = df['value_list'][i][j]
        counter = counter + 1
        j = j+1
    i = i+1

print(df_new)

This link could help with your requirement.

Upvotes: 1

jaymac
jaymac

Reputation: 11

Both the numpy and the itertools methods worked pretty fine.

I ended up using the itertools method and used the for to write each line to a file.

It saved me a lot of time and code.

Thanks a lot!!


for elem in itertools.chain.from_iterable(df['keyPhrases'].values):
    textfile.write(elem + "\n")

Upvotes: 1

Umesh
Umesh

Reputation: 971

keyPhrases =  df.keyPhrases.tolist()

reduce(lambda x, y: x+y, keyPhrases)

Upvotes: 1

BENY
BENY

Reputation: 323266

A fun way but not recommended

df.keyPhrases.sum()
Out[520]: ['word1', 'word2', 'word4', 'word5', 'word7', 'word8', 'word9']

Upvotes: 2

anky
anky

Reputation: 75080

np.concatenate()

np.concatenate(df.keyPhrases) #data courtesy vurmux

array(['word1', 'word2', 'word4', 'word5', 'word7', 'word8', 'word9'],
  dtype='<U5')

Another way:

import functools
import operator
functools.reduce(operator.iadd, df.keyPhrases, [])
#['word1', 'word2', 'word4', 'word5', 'word7', 'word8', 'word9']

Upvotes: 2

vurmux
vurmux

Reputation: 10020

You can use itertools.chain in combination with dataframe column selection:

import itertools

df = pd.DataFrame({
    'keyPhrases': [
        ['word1', 'word2'],
        ['word4', 'word5', 'word7'],
        ['word8', 'word9']
    ],
    'id': [1,2,3]
})

for elem in itertools.chain.from_iterable(df['keyPhrases'].values):
    print(elem)

will print:

word1
word2
word4
word5
word7
word8
word9

Upvotes: 2

Related Questions