Franci
Franci

Reputation: 2247

how to assign value to the pandas column?

I have a DataFrame, say one column is:

{'university':'A','B','A','C'}

I want to change the column into:

{'university':1,2,1,3}

According to an imaginary dict:

{'A':1,'B':2,'C':3}

how to get this done?

ps: I solved the original problem, it's something about my own computer setting. And I changed the question accordingly to be more helpful.

Upvotes: 0

Views: 769

Answers (2)

jezrael
jezrael

Reputation: 862691

I think you need map by dict - d:

df.university = df.university.map(d)

If need encode the object as an enumerated type or categorical variable use factorize:

df.university = pd.factorize(df.university)[0] + 1

Sample:

d = {'A':1,'B':2,'C':3}
df = pd.DataFrame({'university':['A','B','A','C']})
df['a'] = df.university.map(d)
df['b'] = pd.factorize(df.university)[0] + 1
print (df)
  university  a  b
0          A  1  1
1          B  2  2
2          A  1  1
3          C  3  3

I try rewrite your function:

def given_value(column):
    columnlist=column.drop_duplicates()
    #reset to default monotonic increasing (0,1,2, ...)
    columnlist = columnlist.reset_index(drop=True)
    #print (columnlist)
    #swap index and values to new Series columnlist_rev
    columnlist_rev= pd.Series(columnlist.index, index=columnlist.values)
    #map by columnlist_rev  
    column=column.map(columnlist_rev)

    return column

print (given_value(df.university))
0    0
1    1
2    0
3    2
Name: university, dtype: int64

Upvotes: 1

spicypumpkin
spicypumpkin

Reputation: 1219

AttributeError: 'DataFrame' object has no attribute 'column'

Your answer is written in the Exception statement! DataFrame object doesn't have an attribute called column, which means you can't call on DataFrame.column at any point in your code. I believe your problem exists outside of what you have posted here, likely to be somewhere near the part where you imported the data as a DataFrame fro the first time. My guess is that when you were naming the columns, you did something like df.column = [university] instead of df.columns = [university]. The s matters. If you read the Traceback closely, you'll be able to figure out precisely which line is throwing the error.

Also, in your posted function, you do not need the parameter df as it is not used at any point during the process.

Upvotes: 1

Related Questions