Reputation: 1202
thanks for stopping by! I was hoping to get some help creating a csv using pandas dataframe. Here is my code:
a = ldamallet[bow_corpus_new[:21]]
b = data_text_new
print(a)
print("/n")
print(b)
d = {'Preprocessed Document': b['Preprocessed Document'].tolist(),
'topic_0': a[0][1],
'topic_1': a[1][1],
'topic_2': a[2][1],
'topic_3': a[3][1],
'topic_4': a[4][1],
'topic_5': a[5][1],
'topic_6': a[6][1],
'topic_7': a[7][1],
'topic_8': a[8][1],
'topic_9': a[9][1],
'topic_10': a[10][1],
'topic_11': a[11][1],
'topic_12': a[12][1],
'topic_13': a[13][1],
'topic_14': a[14][1],
'topic_15': a[15][1],
'topic_16': a[16][1],
'topic_17': a[17][1],
'topic_18': a[18][1],
'topic_19': a[19][1]}
print(d)
df = pd.DataFrame(data=d)
df.to_csv("test.csv", index=False)
The data:
print(a): the format is in tuples
[[(topic number: 0, topic percentage),...(19, #)], [(topic distribution for next row, #)...(19, .819438),...(#,#),...]
print(b)
This is the size of the dataframe:
This is what I wished it looked like:
Any help would be greatly appreciated :)
Upvotes: 1
Views: 540
Reputation: 1202
I took @mattcremeens advice and it worked. I've posted the full code below. He was right about nixing the tuples my previous code wasn't iterating through the rows but only printed the first row.
topic_0=[]
topic_1=[]
topic_2=[]
topic_3=[]
topic_4=[]
topic_5=[]
topic_6=[]
topic_7=[]
topic_8=[]
topic_9=[]
topic_10=[]
topic_11=[]
topic_12=[]
topic_13=[]
topic_14=[]
topic_15=[]
topic_16=[]
topic_17=[]
topic_18=[]
topic_19=[]
for i in a:
topic_0.append(i[0][1])
topic_1.append(i[1][1])
topic_2.append(i[2][1])
topic_3.append(i[3][1])
topic_4.append(i[4][1])
topic_5.append(i[5][1])
topic_6.append(i[6][1])
topic_7.append(i[7][1])
topic_8.append(i[8][1])
topic_9.append(i[9][1])
topic_10.append(i[10][1])
topic_11.append(i[11][1])
topic_12.append(i[12][1])
topic_13.append(i[13][1])
topic_14.append(i[14][1])
topic_15.append(i[15][1])
topic_16.append(i[16][1])
topic_17.append(i[17][1])
topic_18.append(i[18][1])
topic_19.append(i[19][1])
d = {'Preprocessed Document': b['Preprocessed Document'].tolist(),
'topic_0': topic_0,
'topic_1': topic_1,
'topic_2': topic_2,
'topic_3': topic_3,
'topic_4': topic_4,
'topic_5': topic_5,
'topic_6': topic_6,
'topic_7': topic_7,
'topic_8': topic_8,
'topic_9': topic_9,
'topic_10': topic_10,
'topic_11': topic_11,
'topic_12': topic_12,
'topic_13': topic_13,
'topic_14': topic_14,
'topic_15': topic_15,
'topic_16': topic_16,
'topic_17': topic_17,
'topic_18': topic_18,
'topic_19': topic_19}
df = pd.DataFrame(data=d)
df.to_csv("test.csv", index=False, mode = 'a')
Upvotes: 0
Reputation: 5151
It might be easiest to get the second value of each tuple for all of the rows in it's own list. Something like this
topic_0=[]
topic_1=[]
topic_2=[]
...and so on
for i in a:
topic_0.append(i[0][1])
topic_1.append(i[1][1])
topic_2.append(i[2][1])
...and so on
Then you can make your dictionary like so
d = {'Preprocessed Document': b['Preprocessed Document'].tolist(),
'topic_0': topic_0,
'topic_1': topic_1,
etc. }
Upvotes: 1