Reputation: 6753
I want to apply a function to an entire column in a Pandas dataframe. This function will overwrite the data currently in that column but requires the value of another column next to it, to illustrate:
col 0, col 1,
23, 'word'
45, 'word2'
63, 'word3'
I have tired passing in the number column into Pandas apply method:
df[1] = df.apply(retrieve_original_string(df[0]), axis=1)
But this throws an error:
sys:1: DtypeWarning: Columns (3,4) have mixed types. Specify dtype option on import or set low_memory=False.
Traceback (most recent call last):
File "/home/noname365/similar_keywords_microsoft/similar_keywords.py", line 95, in <module>
merged_df[1] = merged_df.apply(retrieve_original_string(merged_df[0], match_df), axis=1)
File "/home/noname365/similar_keywords_microsoft/similar_keywords.py", line 12, in retrieve_original_string
row_num = int(row)
File "/home/noname365/virtualenvs/env35/lib/python3.5/site-packages/pandas/core/series.py", line 81, in wrapper
"cannot convert the series to {0}".format(str(converter)))
TypeError: cannot convert the series to <class 'int'>
The error implies that I am passing the whole number column to the function instead of individually on a row-by-row basis. How would I accomplish this?
Upvotes: 3
Views: 328
Reputation: 862441
IIUC you need iloc
for selecting second column and add lambda
as mentioned EdChum:
def retrieve_original_string(x):
x = x + 4
#add code
return x
df.iloc[:,1] = df.apply(lambda x: retrieve_original_string(x[0]), axis=1)
print df
col 0 col 1
0 23 27
1 45 49
2 63 67
#if you need new column
df['a'] = df.apply(lambda x: retrieve_original_string(x[0]), axis=1)
print df
col 0 col 1 a
0 23 'word' 27
1 45 'word2' 49
2 63 'word3' 67
Or:
def retrieve_original_string(x):
x = x + 4
#add code
return x
df.iloc[:,1] = df.iloc[:,0].apply(retrieve_original_string)
print df
col 0 col 1
0 23 27
1 45 49
2 63 67
Upvotes: 2