user7830303
user7830303

Reputation:

Pandas - merge many rows into one

with this:

dataset = pd.read_csv('lyrics.csv', delimiter = '\t', quoting = 3)

I print my dataset in this fashion:

                                   lyrics,classification
0       "I should have known better with a girl like you
1               That I would love everything that you do
2                        And I do, hey hey hey, and I do
3                                          Whoa, whoa, I
4                    Never realized what I kiss could be
5                           This could only happen to me
6                           Can't you see, can't you see
7               That when I tell you that I love you, oh
8      You're gonna say you love me too, hoo, hoo, ho...
9                          And when I ask you to be mine
10                      You're gonna say you love me too
11          So, oh I never realized what I kiss could be
12       Whoa whoa I never realized what I kiss could be
13                                       You love me too
14                                    You love me too",0

but what I really need is to have all thats between "" per row. how do I make this conversion in pandas?

Upvotes: 2

Views: 653

Answers (1)

C8H10N4O2
C8H10N4O2

Reputation: 18995

Solution that worked for OP (from comments):

Fixing the problem at its source (in read_csv):

@nbeuchat is probably right, just try

dataset = pd.read_csv('lyrics.csv', quoting = 2)

That should give you a dataframe with one row and two columns: lyrics (with embedded line returns in the string) and classification (0).

General solution for collapsing series of strings:

You want to use pd.Series.str.cat:

import pandas as pd

dataset = pd.DataFrame({'lyrics':pd.Series(['happy birthday to you',
                                            'happy birthday to you',
                                            'happy birthday dear outkast',
                                            'happy birthday to you'])})    
dataset['lyrics'].str.cat(sep=' / ')   
# 'happy birthday to you / happy birthday to you / happy birthday dear outkast / happy birthday to you'

The default sep is None, which would give you 'happy birthday to youhappy birthday to youhappy ...' so pick the sep value that works for you. Above I used slashes (padded with spaces) since that's what you typically see in quotations of songs and poems.

You can also try print(dataset['lyrics'].str.cat(sep='\n')) which maintains the line breaks but stores them all in one string instead of one string per line.

Upvotes: 1

Related Questions