Marco
Marco

Reputation: 2797

How to save the output of a while loop within a for loop in Python?

I have a for-while-loop combination to check value differences for person-year observations. The entire things gives me a boolean list as outcome which I need for further analysis.

I tried several versions of append, none was working.

Here is my data:

import pandas as pd
df = pd.DataFrame({'year': ['2001', '2004', '2005', '2006', '2007', '2008', '2009',
                             '2003', '2004', '2005', '2006', '2007', '2008', '2009',
                            '2003', '2004', '2005', '2006', '2007', '2008', '2009'],
                   'id': ['1', '1', '1', '1', '1', '1', '1', 
                          '2', '2', '2', '2', '2', '2', '2',
                         '5', '5', '5','5', '5', '5', '5'],
                   'money': ['15', '15', '15', '21', '21', '21', '21', 
                             '17', '17', '17', '20', '17', '17', '17',
                            '25', '30', '22', '25', '8', '7', '12']}).astype(int)

Here is my code:

# for every person
for i in df.id.unique():
    # find the first and last index value
    first = df[df['id']==i].index.values.astype(int)[0] 
    last = df[df['id']==i].index.values.astype(int)[-1] 
    # first element has to be kept
    print(False)
    # for all elements, compare values next to each other
    while first < last:
        abs_diff = abs( df['money'][first] - df['money'][first+1] ) > 0
        # print TRUE, when adjacent values differ
        print(abs_diff)
        # update the counter
        first +=1

It returns a boolean list namely: FalseFalseFalseTrueFalseFalseFalseFalseFalseFalseTrueTrueFalseFalseFalseTrueTrueTrueTrueTrueTrue

Question: How can I save that loop output in a list?

Upvotes: 0

Views: 143

Answers (4)

Chris Adams
Chris Adams

Reputation: 18647

IIUC use groupby, diff, fillna and ne:

df.groupby('id')['money'].diff().fillna(0).ne(0).to_list()

[out]

[False,
 False,
 False,
 True,
 False,
 False,
 False,
 False,
 False,
 False,
 True,
 True,
 False,
 False,
 False,
 True,
 True,
 True,
 True,
 True,
 True]

Upvotes: 2

Aakash aggarwal
Aakash aggarwal

Reputation: 443

from collections import defaultdict
result = defaultdict(list)
for i in df.id.unique():
    # find the first and last index value
    first = df[df['id']==i].index.values.astype(int)[0] 
    last = df[df['id']==i].index.values.astype(int)[-1] 
    # first element has to be kept
    print(False)
    result.append("False")

    # my try: diff = [] 

    # for all elements, compare values next to each other
    while first < last:
        abs_diff = abs( df['money'][first] - df['money'][first+1] ) > 0
        # print TRUE, when adjacent values differ
        print(abs_diff)
        result[i].append(abs_diff)

        # my try: diff.append(abs_diff)

        # update the counter
        first +=1

I guess it will work if you want to save id for each person individually.

Upvotes: 1

Batman
Batman

Reputation: 8917

If I understand your question correctly, you just need to define the variable outside of the for loop:

output = list()

for i in df.id.unique():
    # find the first and last index value
    first = df[df['id']==i].index.values.astype(int)[0] 
    last = df[df['id']==i].index.values.astype(int)[-1] 
    # first element has to be kept
    output.append(False)

    # for all elements, compare values next to each other
    while first < last:
        abs_diff = abs( df['money'][first] - df['money'][first+1] ) > 0
        # print TRUE, when adjacent values differ
        output.append(abs_diff)

        # update the counter
        first +=1
print(output)

At the moment you're resetting the value each time through, so that you only end up with the output for the last id.

Upvotes: 1

Ghassen
Ghassen

Reputation: 794

try this

import pandas as pd
df = pd.DataFrame({'year': ['2001', '2004', '2005', '2006', '2007', '2008', '2009',
                             '2003', '2004', '2005', '2006', '2007', '2008', '2009',
                            '2003', '2004', '2005', '2006', '2007', '2008', '2009'],
                   'id': ['1', '1', '1', '1', '1', '1', '1',
                          '2', '2', '2', '2', '2', '2', '2',
                         '5', '5', '5','5', '5', '5', '5'],
                   'money': ['15', '15', '15', '21', '21', '21', '21',
                             '17', '17', '17', '20', '17', '17', '17',
                            '25', '30', '22', '25', '8', '7', '12']}).astype(int)
# for every person
l=list()
for i in df.id.unique():
    # find the first and last index value
    first = df[df['id']==i].index.values.astype(int)[0]
    last = df[df['id']==i].index.values.astype(int)[-1]
    # first element has to be kept
    print(False)
    l.append(False)
    # my try: diff = []

    # for all elements, compare values next to each other
    while first < last:
        abs_diff = abs( df['money'][first] - df['money'][first+1] ) > 0
        # print TRUE, when adjacent values differ
        l.append(abs_diff)
        # my try: diff.append(abs_diff)

        # update the counter
        first +=1
print(l)

output:

[False, False, False, True, False, False, False, False, False, False, True, True, False, False, False, True, True, True, True, True, True]

Upvotes: 1

Related Questions