coder
coder

Reputation: 23

Merge datasets using pandas

Below I have code which was provided to me in order to join 2 datasets.

import pandas as pd
from sklearn.datasets import load_iris
import matplotlib.pyplot as plt

df= pd.read_csv("student/student-por.csv")
ds= pd.read_csv("student/student-mat.csv")

print("before merge")

print(df)
print(ds)

print("After merging:")

dq = pd.merge(df,ds,by=c("school","sex","age","address","famsize","Pstatus","Medu","Fedu","Mjob","Fjob","reason","nursery","internet"))

print(dq)

I get this error:

Traceback (most recent call last):
  File "/Users/PycharmProjects/datamining/main.py", line 15, in <module>
    dq = pd.merge(df, ds,by=c ("school","sex","age","address","famsize","Pstatus","Medu","Fedu","Mjob","Fjob","reason","nursery","internet"))
NameError: name 'c' is not defined

Any help would be great, I've tried messing about with it for a while. I believe the 'by=c' is the issue.

Thanks

Upvotes: 0

Views: 159

Answers (1)

Pivoshenko
Pivoshenko

Reputation: 359

Hi 👋🏻 Hope you are doing well!

The error is happening because of the c symbol in the arguments of the merge function. Also merge function has a different signature and it doesn't have the argument by but instead it should be on, which accepts only the list of columns 🙂 So in summary it should something similar to this:

import pandas as pd

df = pd.read_csv("student/student-por.csv")
ds = pd.read_csv("student/student-mat.csv")

print("Before merge.")
print(df)
print(ds)

print("After merge.")
dq = pd.merge(
    left=df,
    right=ds,
    on=[
        "school",
        "sex",
        "age",
        "address",
        "famsize",
        "Pstatus",
        "Medu",
        "Fedu",
        "Mjob",
        "Fjob",
        "reason",
        "nursery",
        "internet",
    ],
)
print(dq)

Docs: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html

Upvotes: 1

Related Questions