Reputation: 183

Python nested loop over pandas dataframe

I have a pandas data frame of 30 000 rows that looks like this:

ID     year    month    var1-var300    test   

1111   2017    7        ...            1      
1111   2017    9        ...            0      
2222   2017    6        ...            1      
2222   2017    6        ...            0      
2222   2016    6        ...            0      
3333   2017    3        ...            1      
3333   2017    3        ...            0     
3333   2015    8        ...            0      
...

Here is what I want to do for each row: if test=1, I would like to extract the variables "ID year month", loop over the entire data frame and if this combination of variables is found in any other row, assign 1 to a new variable 'check'. The final data frame should look like this:

ID     year    month    var1-var300    test   check
1111   2017    7        ...            1      0
1111   2017    9        ...            0      0
2222   2017    6        ...            1      1
2222   2017    6        ...            0      0
2222   2016    6        ...            0      0
3333   2017    3        ...            1      1
3333   2017    3        ...            0      0
3333   2015    8        ...            0      0
...

Here is some kind of pseudo-code I have imagined:

for line in df:
    if line['test']=1:
        I=line['ID']
        Y=line['year']
        MO=line['month']
        for row in df:
            if row['ID']=I & row['year']=Y & row['month']=MO:
                line['check']=1
                break

Any idea how to do a similar code that works in Python?

Upvotes: 0

Answers (4)

Prakriti Shaurya

Reputation: 197

You can make some changes as below and try:

for line in df:
    if line['test']==1:
        I=line['ID']
        Y=line['year']
        MO=line['month']
        for row in df:
            if row['ID']==I and row['year']==Y and row['month']==MO:
                line['check']=1
                break

Upvotes: 0

Xu Qiushi

Reputation: 1161

I think you can just use a transform to count group. Then you can get the result. Just two lines.

This is my solution.

Create Test Data:

import pandas as pd
ID = [1111, 1111, 2222, 2222, 2222, 3333, 3333, 3333]
year = [2017, 2017, 2017, 2017, 2016, 2017, 2017, 2015]
month = [7, 9, 6, 6, 6, 3, 3, 8]
test = [1, 0, 1, 0, 0, 1, 0, 0]
df = pd.DataFrame({
    "ID": ID,
    "year": year,
    "month": month,
    "test": test
})

Get the result:

df.loc[:, "group_count"] = df.groupby(["ID", "year", "month"]).transform("count").values
df.loc[:, "check"] = ((df["test"]>0) & (df["group_count"] > 1)).astype(int)

Upvotes: 1

a_gdevr

Reputation: 103

So you want a single column where it is indicated if the corresponding row's ID, year and month correspond to ID, year and month of a row with test == 1?

You iterate with iterrows():

to_check = []
for index, row in df.iterrows():
    if row['test']==1: # in your pseudocode, you use single =; that's for assigning variables
        to_ckeck.append([row['ID'], row['year'], row['month']])

check = []
for index, row in df.iterrows():
      if [row['ID'], row['year'], row['month']] in to_check:
          check.append(1)
      else:
          check.append(0)
df["check"] = check

Upvotes: 0

Nils Werner

Reputation: 36839

You should be able to invert your logic:

Group by ID and year
Do your check inside each group

def func(group):
    if len(group) > 1:
        group.loc[group['test'] == 1, 'check'] = 1
    return group

df = df.groupby(['ID', 'year']).apply(func)

Upvotes: 1

Python nested loop over pandas dataframe

Answers (4)

Related Questions