Reputation: 71
Could I ask how to retrieve an index of a row in a DataFrame? Specifically, I am able to retrieve the index of rows from a df.loc.
idx = data.loc[data.name == "Smith"].index
I can even retrieve row index from df.loc by using data.index like this:
idx = data.loc[data.index == 5].index
However, I cannot retrieve the index directly from the row itself (i.e., from row.index, instead of df.loc[].index). I tried using these codes:
idx = data.iloc[5].index
The result of this code is the column names.
To provide context, the reason I need to retrieve the index of a specific row (instead of rows from df.loc) is to use df.apply for each row. I plan to use df.apply to apply a code to each row and copy the data from the row immediately above them.
def retrieve_gender (row):
# This is a panel data, whose only data in 2000 is already keyed in. Time-invariant data in later years are the same as those in 2000.
if row["Year"] == 2000:
pass
elif row["Year"] == 2001: # To avoid complexity, let's use only year 2001 as example.
idx = row.index # This is wrong code.
row["Gender"] = row.iloc[idx-1]["Gender"]
return row["Gender"]
data["Gender"] = data.apply(retrieve_gender, axis=1)
Upvotes: 1
Views: 12405
Reputation: 164613
apply
gives series indexed by column labelsThe problem with idx = data.iloc[5].index
is data.iloc[5]
converts a row to a pd.Series
object indexed by column labels.
In fact, what you are asking for is impossible via pd.DataFrame.apply
because the series that feeds your retrieve_gender
function does not include any index identifier.
With Pandas row-wise logic is inefficient and not recommended; it involves a Python-level loop. Use columnwise logic instead. Taking a step back, it seems you wish to implement 2 rules:
Year
is not 2001, leave Gender
unchanged.Year
is 2001, use Gender
from previous row.np.where
+ shift
For the above logic, you can use np.where
with pd.Series.shift
:
data['Gender'] = np.where(data['Year'] == 2001, data['Gender'].shift(), data['Gender'])
mask
+ shift
Alternatively, you can use mask
+ shift
:
data['Gender'] = data['Gender'].mask(data['Year'] == 2001, data['Gender'].shift())
Upvotes: 0
Reputation: 596
With Pandas you can loop through your dataframe like this :
for index in range(len(df)):
if df.loc[index,'year'] == "2001":
df.loc[index,'Gender'] = df.loc[index-1 ,'Gender']
Upvotes: 1