Reputation: 145
I am new to Panda's and DataFrames and have run into an issue. The DataFrame.apply() method passes a row parameter to the provided function. However I can't seem to find out what the index value corresponding to that row is from this row parameter.
An example
df = DataFrame ({'a' : np.random.randn(6),
'b' : ['foo', 'bar'] * 3,
'c' : np.random.randn(6)})
df = df.set_index('a')
def my_test2(row):
return "{}.{}".format(row['a'], row['b'])
df['Value'] = df.apply(my_test2, axis=1)
Yields a KeyError
KeyError: ('a', u'occurred at index -1.16119852166')
The problem is that the row['a'] in the my_test2 method fails. If I don't do the df.set_index('a') it works fine, but I do want to have an index on a.
I tried duplicating column a (once as index, and once as a column) and this works, but this just seems ugly and problematic.
Any ideas on how to get the corresponding index value given the row object?
Many thanks in advance.
Upvotes: 5
Views: 9993
Reputation: 1457
I believe what you want is this:
def my_test(row):
return "{}.{}".format(row.name, row['b'])
THis works because:
"{}.{}".format("ham", "cheese")
returns
'ham.cheese'
and if you reference a single row, the name attribute returns the index. For the example above:
df.iloc[0].name
returns
b foo
c 1.417726
Value 0.7842562355491481.foo
Name: 0.784256235549, dtype: object
Therefore this function is equivalent to finding the index of the ith row and executing this command
"{}.{}".format(df.iloc[i].name, df.iloc[i]['b'])
then the apply function does this for all rows.
Upvotes: 5