Reputation:
with this:
dataset = pd.read_csv('lyrics.csv', delimiter = '\t', quoting = 3)
I print my dataset in this fashion:
lyrics,classification
0 "I should have known better with a girl like you
1 That I would love everything that you do
2 And I do, hey hey hey, and I do
3 Whoa, whoa, I
4 Never realized what I kiss could be
5 This could only happen to me
6 Can't you see, can't you see
7 That when I tell you that I love you, oh
8 You're gonna say you love me too, hoo, hoo, ho...
9 And when I ask you to be mine
10 You're gonna say you love me too
11 So, oh I never realized what I kiss could be
12 Whoa whoa I never realized what I kiss could be
13 You love me too
14 You love me too",0
but what I really need is to have all thats between ""
per row. how do I make this conversion in pandas
?
Upvotes: 2
Views: 653
Reputation: 18995
Fixing the problem at its source (in read_csv
):
@nbeuchat is probably right, just try
dataset = pd.read_csv('lyrics.csv', quoting = 2)
That should give you a dataframe with one row and two columns: lyrics (with embedded line returns in the string) and classification (0).
You want to use pd.Series.str.cat:
import pandas as pd
dataset = pd.DataFrame({'lyrics':pd.Series(['happy birthday to you',
'happy birthday to you',
'happy birthday dear outkast',
'happy birthday to you'])})
dataset['lyrics'].str.cat(sep=' / ')
# 'happy birthday to you / happy birthday to you / happy birthday dear outkast / happy birthday to you'
The default sep
is None
, which would give you 'happy birthday to youhappy birthday to youhappy ...'
so pick the sep
value that works for you. Above I used slashes (padded with spaces) since that's what you typically see in quotations of songs and poems.
You can also try print(dataset['lyrics'].str.cat(sep='\n'))
which maintains the line breaks but stores them all in one string instead of one string per line.
Upvotes: 1