Reputation: 2247
I have a DataFrame, say one column is:
{'university':'A','B','A','C'}
I want to change the column into:
{'university':1,2,1,3}
According to an imaginary dict:
{'A':1,'B':2,'C':3}
how to get this done?
ps: I solved the original problem, it's something about my own computer setting. And I changed the question accordingly to be more helpful.
Upvotes: 0
Views: 769
Reputation: 862691
I think you need map
by dict
- d
:
df.university = df.university.map(d)
If need encode the object as an enumerated type or categorical variable use factorize
:
df.university = pd.factorize(df.university)[0] + 1
Sample:
d = {'A':1,'B':2,'C':3}
df = pd.DataFrame({'university':['A','B','A','C']})
df['a'] = df.university.map(d)
df['b'] = pd.factorize(df.university)[0] + 1
print (df)
university a b
0 A 1 1
1 B 2 2
2 A 1 1
3 C 3 3
I try rewrite your function:
def given_value(column):
columnlist=column.drop_duplicates()
#reset to default monotonic increasing (0,1,2, ...)
columnlist = columnlist.reset_index(drop=True)
#print (columnlist)
#swap index and values to new Series columnlist_rev
columnlist_rev= pd.Series(columnlist.index, index=columnlist.values)
#map by columnlist_rev
column=column.map(columnlist_rev)
return column
print (given_value(df.university))
0 0
1 1
2 0
3 2
Name: university, dtype: int64
Upvotes: 1
Reputation: 1219
AttributeError: 'DataFrame' object has no attribute 'column'
Your answer is written in the Exception statement! DataFrame object doesn't have an attribute called column
, which means you can't call on DataFrame.column at any point in your code. I believe your problem exists outside of what you have posted here, likely to be somewhere near the part where you imported the data as a DataFrame fro the first time. My guess is that when you were naming the columns, you did something like df.column = [university]
instead of df.columns = [university]
. The s matters. If you read the Traceback closely, you'll be able to figure out precisely which line is throwing the error.
Also, in your posted function, you do not need the parameter df as it is not used at any point during the process.
Upvotes: 1