Vasile Alexandru Peşte
Vasile Alexandru Peşte

Reputation: 1308

Python pandas map CSV file

I want to "merge" two CSV files. I want to map the emails from the File 1 and get their respective userId from File 2 then I want to assign it to the respective emails of File 1

Example:

File 1

name, userId, email
john, null, [email protected]
alex, null, [email protected]
micheal, null, [email protected]
alex, null, [email protected]
john, null, [email protected]

File 2

name, userId, email
alex, 5, [email protected]
micheal, 10, [email protected]
john, 12, [email protected]

Output File

name, userId, email
john, 12, [email protected]
alex, 5, [email protected]
micheal, 10, [email protected]
alex, 5, [email protected]
john, 12, [email protected]

This is my code but this doesn't assign the userId of the respective email because emails are not ordered

import pandas as pd

df1 = pd.read_csv("file1.csv", sep=",")
df2 = pd.read_csv("file2.csv", sep=",", index_col=0)

df1["userId"] = df2["userId"].values

df1.to_csv("output.csv", sep=";")

Anyone can help me?

Upvotes: 2

Views: 3270

Answers (2)

HSchmachty
HSchmachty

Reputation: 306

Dataframe.merge

df1 = pd.read_csv("file1.csv", sep=",")
df1.columns = ['name', 'userid', 'email']
df2 = pd.read_csv("file2.csv", sep=",", index_col=0)
df1 = df1.drop(['userId'], axis=1)

result = pd.merge(df1, df2, on=['name','email'], how='right')

result.to_csv("output.csv", sep=";")

How I tested:

import pandas as pd

df1 = pd.DataFrame({'name': ['john', 'alex', 'michael', 'alex', 'john'],
                    'userId': ['null', 'null', 'null', 'null', 'null'],
                    'email': ['[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]']
                    }, columns=['name','userId','email'])

df2 = pd.DataFrame({'name': ['alex', 'michael', 'john'],
                    'userId': ['5', '10', '12'],
                    'email': ['[email protected]', '[email protected]', '[email protected]']
                    })

df1 = df1.drop(['userId'], axis=1)

result = pd.merge(df1, df2, on=['name','email'], how='right')

print(df1)
print(df2)

print(result)

Upvotes: 1

tormond
tormond

Reputation: 452

So I think if you drop the UserID field from File 1 and drop name from File 2. Then merge the two dataframes on email address and you should get the

Upvotes: 1

Related Questions