Reputation: 449
I am currently trying for the first time to iterate through a set of columns (within a df) with the goal of creating a two new columns with:
1).one that totals up the number of 1's in the columns it iterated through 2).the second column if any of the iterated columns have a 1 in it, it puts a one in the new column and breaks. *** I solved this piece ***
ADDED MY DF for context:
INDEX ID STATE Filed_Month ... MI HD Stroke Diabeties_all
0 0 20190 Alabama January ... 2.0 2.0 2.0 3.0
1 1 20191 Alabama January ... 2.0 2.0 2.0 3.0
2 2 20192 Alabama January ... 2.0 2.0 2.0 1.0
3 3 20193 Alabama January ... 2.0 2.0 2.0 3.0
4 4 20194 Alabama January ... 2.0 2.0 2.0 3.0
[5 rows x 13 columns]
I also should mention when totaling up the 1's I am only interested in 7 out of the 13 columns.
I was able to get the second part of my question using the following:
def ifanyCM(row):
if row["Asthma"] == 1:
return 1
if row["Asthma"] != 1:
return 0
if row["COPD_all"] == 1:
return 1
if row["COPD_all"] != 1:
return 0
if row["Skin_Cancer"] == 1:
return 1
if row["Skin_Cancer"] != 1:
return 0
if row["Other_Cancer"] == 1:
return 1
if row["Other_Cancer"] != 1:
return 0
if row["MI"] == 1:
return 1
if row["MI"] != 1:
return 0
if row["HD"] == 1:
return 1
if row["HD"] != 1:
return 0
if row["Stroke"] == 1:
return 1
if row["Stroke"] != 1:
return 0
if row["Diabeties_all"] == 1:
return 1
if row["Diabeties_all"] != 1:
return 0
y2019_r1["CM"] = y2019_r1.apply(lambda row: ifanyCM(row), axis=1)
I am just struggling to total up the ones out of the 7 columns of interest.
Cheers,
Rachel
Upvotes: 1
Views: 66
Reputation: 68116
It's unclear what your dataframe looks like. But assuming it's a wide dataframe like this:
import pandas
import numpy
df = pandas.DataFrame({
1: [1, 2, 1, 2, 7, 1, 1, 9],
2: [2, 9, 2, 2, 2, 1, numpy.nan, numpy.nan]
}).fillna(0).astype(int).T.rename(
index=lambda r: f"row{r}",
columns=lambda r: f"col{r}",
)
col0 col1 col2 col3 col4 col5 col6 col7
row1 1 2 1 2 7 1 1 9
row2 2 9 2 2 2 1 0 0
Then you don't need any loops or .apply
at all (because apply
is the same thing as a loop:
data = df.assign(
CountOfOnes=lambda df: df.eq(1).sum(axis=1),
HasAnyOne=lambda df: df.eq(1).any(axis=1)
)
Which is:
col0 col1 col2 col3 col4 col5 col6 col7 CountOfOnes HasAnyOne
row1 1 2 1 2 7 1 1 9 4 True
row2 2 9 2 2 2 1 0 0 1 True
Upvotes: 2
Reputation: 9619
An easy way to do it is by using a df.apply() with a lambda function in pandas. When you specify axis=1 it will process the a row in the dataframe as a series. In the example below I counted the length of a list comprehension that only kept the 1 values in the row to count the number of 1s. Subsequently I used another df.apply() to the column with the total number of 1s to check whether it contains a value larger than 0.
import pandas as pd
l = [[1, 2, 1, 2, 7, 1],[2, 9, 2, 2, 2, 2]]
df= pd.DataFrame(l)
df['total 1s'] = df[[0,1,2,3,4,5]].apply(lambda row: len([i for i in row if i == 1]), axis=1)
df['any 1s'] = df['total 1s'].apply(lambda x: False if x == 0 else True)
Result:
+----+-----+-----+-----+-----+-----+-----+------------+----------+
| | 0 | 1 | 2 | 3 | 4 | 5 | total 1s | any 1s |
|----+-----+-----+-----+-----+-----+-----+------------+----------|
| 0 | 1 | 2 | 1 | 2 | 7 | 1 | 3 | True |
| 1 | 2 | 9 | 2 | 2 | 2 | 2 | 0 | False |
+----+-----+-----+-----+-----+-----+-----+------------+----------+
Upvotes: 1