Reputation: 83
There is a data frame like below:
A | B |
---|---|
1 | 12 |
84 | 15 |
51 | 42 |
2 | 10 |
Each value shows the position of a string in a list. For example, list A=[Cat, Dog, Cow, ...] Therefore, the first value in column A should be Dog. How can I replace this values in this data frame fast. This data frame has more than 1 million rows. I wrote the code below, but it seems that it takes ages to run!!
for i in range (0, len(df)):
a = df.iloc[i,0]
df.iloc[i,0] = A[a]
b = df.iloc[i,1]
df.iloc[i,1] = B[b]
Upvotes: 1
Views: 354
Reputation: 10624
You can use numpy which is much faster than Pandas. Try the following:
valsA=['Cat', 'Dog', 'Cow'] * 100
valsA=np.array(valsA)
valsB=['Dog', 'Cat', 'Cow'] * 100
valsB=np.array(valsB)
df['A']=valsA.take(df['A'])
df['B']=valsB.take(df['B'])
>>> print(df)
A B
0 Dog Dog
1 Cat Dog
2 Cat Dog
3 Cow Cat
Upvotes: 1
Reputation: 1815
You can Use apply.
Generally when using apply output is of type pd.Series, but when result_type='expand'
, the result of apply is unwrapped over columns and returns a pd.DataFrame
Below example is an illustration
>>> A = ['Cat', 'Dog', 'Cow']
>>> B = ['Catb', 'Dogb', 'Cowb']
>>> import pandas as pd
>>> df = pd.DataFrame([[1, 2]] * 3, columns=['A', 'B'])
>>> df.apply(lambda x: [A[x['A']], B[x['B']]], axis=1, result_type='expand')
0 1
0 Dog Cowb
1 Dog Cowb
2 Dog Cowb
Also one more method using map but without using lambda List comprehension vs map
>>> df['A'] = df['A'].map(A.__getitem__)
>>> df['B'] = df['B'].map(B.__getitem__)
Upvotes: 0
Reputation: 1151
So I don't believe your code is particularly bad from an efficiency point of view. It's likely to take a while given that you have such a large dataframe.
I would suggest though that the below code is more elegant when applying a function to a column in a dataframe:
df['A'] = df['A'].map(lambda x: A[x])
df['B'] = df['B'].map(lambda x: B[x])
Upvotes: 1