Jeremy Lin
Jeremy Lin

Reputation: 258

Merging two csv files if matched column then concatenate extra columns

I am looking for some assistance with pandas. I am trying to compare the "email" column for any matching and then concatenate the extra columns into a separate csv.

I have the following:

file1.csv

email, user_id
[email protected], 1
[email protected], 24
[email protected], 12
[email protected], 31

file2.csv

email, group_id
[email protected], g1
[email protected], g1
[email protected], g2
[email protected], g3

output.csv:

email, user_id, group_id
[email protected], 1, g1
[email protected], 24, g1
[email protected], 12, g2

I appreciate any assistance provided. Thank You.

Upvotes: 1

Views: 322

Answers (3)

Karl Olufsen
Karl Olufsen

Reputation: 186

As far as I understood, you were trying to achieve something like that:

import pandas as pd

csv1 = pd.DataFrame({"email":["[email protected]", "[email protected]", "[email protected]", "[email protected]"],"user_id":[1,24,12,31]})
csv2 = pd.DataFrame({"email":["[email protected]", "[email protected]", "[email protected]", "[email protected]"],"group_id":["g1","g1","g2","g3"]})
csv3 = pd.DataFrame({"email":["[email protected]", "[email protected]", "[email protected]"],"user_id":[1,24,12],"group_id":["g1","g1","g2"]})

list_with_dataframes = [csv2,csv3]

result = csv1
for dataframe in list_with_dataframes:
    result = result.merge(dataframe, how = "outer")

print(result)

And that gives an output:

                email  user_id group_id
0  [email protected]      1.0       g1
1  [email protected]     24.0       g1
2  [email protected]     12.0       g2
3  [email protected]     31.0      NaN
4  [email protected]      NaN       g3

I hope this is what you are looking for. All matching data is retained, all missing data is changed with NaN. If you want only fully matched data to be retained simply replace "outer" with "inner" in this line:

result = result.merge(dataframe, how = "inner")

That gives an output:

                email  user_id group_id
0  [email protected]        1       g1
1  [email protected]       24       g1
2  [email protected]       12       g2

Upvotes: 2

kbyun
kbyun

Reputation: 442

Try Merge

import pandas as pd 
f1 = pd.read_csv('f1.csv')
f2 = pd.read_csv('f2.csv')
out = pd.merge(f1,f2,on='email',how='inner')
print(out)
out.to_csv("final.csv", index=False)

The output is

email   user_id  group_id
0  [email protected]         1        g1
1  [email protected]        24        g1
2  [email protected]        12        g2

Upvotes: 3

whege
whege

Reputation: 1441

Yep, this is as simple as doing a merge:

output = pd.merge(left=file1, right=file2, on='email', how='inner')

Setting 'how' to 'inner' ensures that only the matches in both files are kept

Upvotes: 1

Related Questions