gleb1783
gleb1783

Reputation: 461

Merge/Concat Issue

I have two Dataframes like the following:

DataFrame 1

ID  VALUE        DATE
1   google.com   12/28/2015
2   yahoo.com    12/28/2015
3   cnn.com      12/28/2015
4   facebook.com 12/28/2105

DataFrame 2

ID  COMMENT
1   Bad Stuff 
2   Good Stuff
3   Werid Stuff
4   Crazy Stuff

And the desired result will be here

ID   VALUE        DATE          COMMENT
1    google.com   12/28/2015    Bad Stuff
2    yahoo.com    12/28/2015    Good Stuff
3    cnn.com      12/28/2015    Weird Stuff
4    facebooo.com 12/28/2015    Crazy Stuff

Looks and seems so simple to me but I can't quite figure it out. I've read the Pandas documentation up and down and searched SOF for a while.

Here's what I have right now.

#Now, let's try to merge this data using Pandas
indicatorPanda = pd.read_csv('/tmp/i.csv')
commentPanda = pd.read_csv(open('/tmp/e.csv', 'rU'), encoding='utf-8')
mergedPanda = pd.concat([indicatorPanda, commentPanda], axis=1)
mergedPanda.to_csv('/tmp/output.csv', index=False, header=False)

I've come close but it looks like concating on Axis 1 still keeps that second 'ID' column at the end.

EDIT: So I'm close now, I am using this:

    mergedPanda = indicatorPanda.merge(commentPanda, on='id', how='left')

But I am having an issue with duplicate rows with the same ID because of multiple lines in the Dataframe2

So now, here's what I want my end result to be:

DataFrame 1

ID  VALUE        DATE
1   google.com   12/28/2015
2   yahoo.com    12/28/2015
3   cnn.com      12/28/2015
4   facebook.com 12/28/2105

DataFrame 2

ID  COMMENT
1   Bad Stuff
1   Much Worse Stuff 
2   Good Stuff
3   Werid Stuff
4   Crazy Stuff

And the desired result will be here

ID   VALUE        DATE          COMMENT
1    google.com   12/28/2015    Bad Stuff | Much Worse Stuff
2    yahoo.com    12/28/2015    Good Stuff
3    cnn.com      12/28/2015    Weird Stuff
4    facebook.com 12/28/2015    Crazy Stuff

Is there a way to just concat the comments onto each other with some sort of character? I know we are out of the realm of regular SQL and Panda.

Upvotes: 3

Views: 77

Answers (2)

Stefan
Stefan

Reputation: 42875

It looks like your DataFrames don't have ID set as index, so .merge() would work as (see docs):

df3 = dataframe1.merge(dataframe2, on='ID', how='left)

Your pd.concat() merges on index, so it should work fine (axis=1 means horizontal, as opposed to vertical combination) if you did df.set_index('ID', inplace=True) for both DataFrame before merging (see docs).

Upvotes: 2

Paul J
Paul J

Reputation: 66

I think Merge would work well for your case.

 dataframe3 = pd.merge(dataframe1, dataframe2, left_index=True, right_index=True)

Upvotes: 2

Related Questions