Reputation: 23
Below I have code which was provided to me in order to join 2 datasets.
import pandas as pd
from sklearn.datasets import load_iris
import matplotlib.pyplot as plt
df= pd.read_csv("student/student-por.csv")
ds= pd.read_csv("student/student-mat.csv")
print("before merge")
print(df)
print(ds)
print("After merging:")
dq = pd.merge(df,ds,by=c("school","sex","age","address","famsize","Pstatus","Medu","Fedu","Mjob","Fjob","reason","nursery","internet"))
print(dq)
I get this error:
Traceback (most recent call last):
File "/Users/PycharmProjects/datamining/main.py", line 15, in <module>
dq = pd.merge(df, ds,by=c ("school","sex","age","address","famsize","Pstatus","Medu","Fedu","Mjob","Fjob","reason","nursery","internet"))
NameError: name 'c' is not defined
Any help would be great, I've tried messing about with it for a while. I believe the 'by=c' is the issue.
Thanks
Upvotes: 0
Views: 159
Reputation: 359
Hi 👋🏻 Hope you are doing well!
The error is happening because of the c
symbol in the arguments of the merge
function. Also merge
function has a different signature and it doesn't have the argument by
but instead it should be on
, which accepts only the list of columns 🙂 So in summary it should something similar to this:
import pandas as pd
df = pd.read_csv("student/student-por.csv")
ds = pd.read_csv("student/student-mat.csv")
print("Before merge.")
print(df)
print(ds)
print("After merge.")
dq = pd.merge(
left=df,
right=ds,
on=[
"school",
"sex",
"age",
"address",
"famsize",
"Pstatus",
"Medu",
"Fedu",
"Mjob",
"Fjob",
"reason",
"nursery",
"internet",
],
)
print(dq)
Docs: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html
Upvotes: 1