kerma
kerma

Reputation: 13

Adding a new column to a dataframe based on the values of another dataframe

I do have two csv files, I am using pandas to read the data.

The train.csv contains values, with headers id, sentiment

87,Positive
10,Positive
7,Neutral

The text.csv contains values, with headers id, text

7,hello, I think the price if high...
87, you can call me tomorow...
....

I would like to insert the text from text.csv into train.csv so the result would be:

87,Positive, you can call me tomorow...

Can any one help with pandas?

import pandas as pd

train= pd.read_csv("train.csv")
text= pd.read_csv("text.csv")

# this does not work
combined= pd.merge(train, text, on=['id'])

Note Some Ids may not be in the files, so I need to set null if the id does not exists

Upvotes: 0

Views: 34

Answers (2)

Arkadip Bhattacharya
Arkadip Bhattacharya

Reputation: 642

One of the easy way can be

pd.merge(train, test, on='id', how='outer')

As per pandas docs, if you use how as outer, it will take all keys

Upvotes: 0

Michael Delgado
Michael Delgado

Reputation: 15432

set the indices on the two dataframes, then add the columns:

train.set_index('id').sentiment + text.set_index('id').text

Upvotes: 1

Related Questions