Slavatron
Slavatron

Reputation: 2358

Pandas - Add column containing metadata about the row

I want to add a column to a Dataframe that will contain a number derived from the number of NaN values in the row, specifically: one less than the number of non-NaN values in the row.

I tried:

for index, row in df.iterrows():
    count = row.value_counts()
    val = sum(count) - 1
    df['Num Hits'] = val

Which returns an error:

-c:4: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead

and puts the first val value into every cell of the new column. I've tried reading about .loc and indexing in the Pandas documentation and failed to make sense of it. I gather that .loc wants a row_index and a column_index but I don't know if these are pre-defined in every dataframe and I just have to specify them somehow or if I need to "set" an index on the dataframe somehow before telling the loop where to place the new value, val.

Upvotes: 0

Views: 402

Answers (2)

CT Zhu
CT Zhu

Reputation: 54380

You can totally do it in a vectorized way without using a loop, which is likely to be faster than the loop version:

In [89]:

print df
          0         1         2         3
0  0.835396  0.330275  0.786579  0.493567
1  0.751678  0.299354  0.050638  0.483490
2  0.559348  0.106477  0.807911  0.883195
3  0.250296  0.281871  0.439523  0.117846
4  0.480055  0.269579  0.282295  0.170642
In [90]:
#number of valid numbers - 1
df.apply(lambda x: np.isfinite(x).sum()-1, axis=1)
Out[90]:
0    3
1    3
2    3
3    3
4    3
dtype: int64

@DSM brought up an good point that the above solution is still not fully vectorized. A vectorized form can be simply (~df.isnull()).sum(axis=1)-1.

Upvotes: 1

snorthway
snorthway

Reputation: 586

You can use the index variable that you define as part of the for loop as the row_index that .loc is looking for:

for index, row in df.iterrows():
    count = row.value_counts()
    val = sum(count) - 1
    df.loc[index, 'Num Hits'] = val

Upvotes: 0

Related Questions