Sara
Sara

Reputation: 1202

ValueError: arrays must all be same length - print dataframe to CSV

thanks for stopping by! I was hoping to get some help creating a csv using pandas dataframe. Here is my code:

a = ldamallet[bow_corpus_new[:21]]
b = data_text_new

print(a)
print("/n")
print(b)

d = {'Preprocessed Document': b['Preprocessed Document'].tolist(), 
     'topic_0': a[0][1], 
     'topic_1': a[1][1], 
     'topic_2': a[2][1], 
     'topic_3': a[3][1], 
     'topic_4': a[4][1], 
     'topic_5': a[5][1], 
     'topic_6': a[6][1], 
     'topic_7': a[7][1], 
     'topic_8': a[8][1], 
     'topic_9': a[9][1], 
     'topic_10': a[10][1],
     'topic_11': a[11][1], 
     'topic_12': a[12][1],
     'topic_13': a[13][1],
     'topic_14': a[14][1],
     'topic_15': a[15][1],
     'topic_16': a[16][1],
     'topic_17': a[17][1],
     'topic_18': a[18][1],
     'topic_19': a[19][1]}

print(d)

df = pd.DataFrame(data=d)
df.to_csv("test.csv", index=False)

The data:

print(a): the format is in tuples

[[(topic number: 0, topic percentage),...(19, #)], [(topic distribution for next row, #)...(19, .819438),...(#,#),...]

print(a)

print(b)

print(b)

Here is my error: the error

This is the size of the dataframe:

The size of b

the shape of a

shape of a

This is what I wished it looked like: the dream

Any help would be greatly appreciated :)

Upvotes: 1

Views: 540

Answers (2)

Sara
Sara

Reputation: 1202

I took @mattcremeens advice and it worked. I've posted the full code below. He was right about nixing the tuples my previous code wasn't iterating through the rows but only printed the first row.

topic_0=[]
topic_1=[]
topic_2=[]
topic_3=[]
topic_4=[]
topic_5=[]
topic_6=[]
topic_7=[]
topic_8=[]
topic_9=[]
topic_10=[]
topic_11=[]
topic_12=[]
topic_13=[]
topic_14=[]
topic_15=[]
topic_16=[]
topic_17=[]
topic_18=[]
topic_19=[]


for i in a:
    topic_0.append(i[0][1])
    topic_1.append(i[1][1])
    topic_2.append(i[2][1])
    topic_3.append(i[3][1])
    topic_4.append(i[4][1])
    topic_5.append(i[5][1])
    topic_6.append(i[6][1])
    topic_7.append(i[7][1])
    topic_8.append(i[8][1])
    topic_9.append(i[9][1])
    topic_10.append(i[10][1])
    topic_11.append(i[11][1])
    topic_12.append(i[12][1])
    topic_13.append(i[13][1])
    topic_14.append(i[14][1])
    topic_15.append(i[15][1])
    topic_16.append(i[16][1])
    topic_17.append(i[17][1])
    topic_18.append(i[18][1])
    topic_19.append(i[19][1])
    
d = {'Preprocessed Document': b['Preprocessed Document'].tolist(),
     'topic_0': topic_0,
     'topic_1': topic_1,
     'topic_2': topic_2,
     'topic_3': topic_3,
     'topic_4': topic_4,
     'topic_5': topic_5,
     'topic_6': topic_6,
     'topic_7': topic_7,
     'topic_8': topic_8,
     'topic_9': topic_9,
     'topic_10': topic_10,
     'topic_11': topic_11,
     'topic_12': topic_12,
     'topic_13': topic_13,
     'topic_14': topic_14,
     'topic_15': topic_15,
     'topic_16': topic_16,
     'topic_17': topic_17,
     'topic_18': topic_18,
     'topic_19': topic_19}

df = pd.DataFrame(data=d)
df.to_csv("test.csv", index=False, mode = 'a')

Upvotes: 0

Matt Cremeens
Matt Cremeens

Reputation: 5151

It might be easiest to get the second value of each tuple for all of the rows in it's own list. Something like this

topic_0=[]
topic_1=[]
topic_2=[]
...and so on
for i in a:
    topic_0.append(i[0][1])
    topic_1.append(i[1][1])
    topic_2.append(i[2][1])
    ...and so on

Then you can make your dictionary like so

d = {'Preprocessed Document': b['Preprocessed Document'].tolist(), 
     'topic_0': topic_0, 
     'topic_1': topic_1, 
      etc. }

Upvotes: 1

Related Questions