SVK
SVK

Reputation: 1034

How to Transform sklearn tfidf vector pandas output to a meaningful format

I have used sklearn to obtain tfidf scores for my corpus but the output is not in the format I wanted.

Code:

vect = TfidfVectorizer(ngram_range=(1,3))
tfidf_matrix = vect.fit_transform(df_doc_wholetext['csv_text'])

df = pd.DataFrame(tfidf_matrix.toarray(),columns=vect.get_feature_names())

df['filename'] = df.index

What I have:

enter image description here

word1, word2, word3 could be any words in the corpus. I mentioned them as word1 , word2, word3 for example.

What I need:

enter image description here

I tried transforming it but it transforms all the columns to rows. Is there a way to achieve this ?

Upvotes: 1

Views: 349

Answers (1)

Stef
Stef

Reputation: 30589

df1 = df.filter(like='word').stack().reset_index()
df1.columns = ['filename','word_name','score']

Output:

   filename word_name  score
0         0     word1   0.01
1         0     word2   0.04
2         0     word3   0.05
3         1     word1   0.02
4         1     word2   0.99
5         1     word3   0.07

Update for general column headers:

df1 = df.iloc[:,1:].stack().reset_index()

Upvotes: 2

Related Questions