Reputation: 183
I have a pandas data frame of 30 000 rows that looks like this:
ID year month var1-var300 test
1111 2017 7 ... 1
1111 2017 9 ... 0
2222 2017 6 ... 1
2222 2017 6 ... 0
2222 2016 6 ... 0
3333 2017 3 ... 1
3333 2017 3 ... 0
3333 2015 8 ... 0
...
Here is what I want to do for each row: if test=1, I would like to extract the variables "ID year month", loop over the entire data frame and if this combination of variables is found in any other row, assign 1 to a new variable 'check'. The final data frame should look like this:
ID year month var1-var300 test check
1111 2017 7 ... 1 0
1111 2017 9 ... 0 0
2222 2017 6 ... 1 1
2222 2017 6 ... 0 0
2222 2016 6 ... 0 0
3333 2017 3 ... 1 1
3333 2017 3 ... 0 0
3333 2015 8 ... 0 0
...
Here is some kind of pseudo-code I have imagined:
for line in df:
if line['test']=1:
I=line['ID']
Y=line['year']
MO=line['month']
for row in df:
if row['ID']=I & row['year']=Y & row['month']=MO:
line['check']=1
break
Any idea how to do a similar code that works in Python?
Upvotes: 0
Views: 223
Reputation: 197
You can make some changes as below and try:
for line in df:
if line['test']==1:
I=line['ID']
Y=line['year']
MO=line['month']
for row in df:
if row['ID']==I and row['year']==Y and row['month']==MO:
line['check']=1
break
Upvotes: 0
Reputation: 1161
I think you can just use a transform to count group. Then you can get the result. Just two lines.
This is my solution.
Create Test Data:
import pandas as pd
ID = [1111, 1111, 2222, 2222, 2222, 3333, 3333, 3333]
year = [2017, 2017, 2017, 2017, 2016, 2017, 2017, 2015]
month = [7, 9, 6, 6, 6, 3, 3, 8]
test = [1, 0, 1, 0, 0, 1, 0, 0]
df = pd.DataFrame({
"ID": ID,
"year": year,
"month": month,
"test": test
})
Get the result:
df.loc[:, "group_count"] = df.groupby(["ID", "year", "month"]).transform("count").values
df.loc[:, "check"] = ((df["test"]>0) & (df["group_count"] > 1)).astype(int)
Upvotes: 1
Reputation: 103
So you want a single column where it is indicated if the corresponding row's ID, year and month correspond to ID, year and month of a row with test == 1?
You iterate with iterrows():
to_check = []
for index, row in df.iterrows():
if row['test']==1: # in your pseudocode, you use single =; that's for assigning variables
to_ckeck.append([row['ID'], row['year'], row['month']])
check = []
for index, row in df.iterrows():
if [row['ID'], row['year'], row['month']] in to_check:
check.append(1)
else:
check.append(0)
df["check"] = check
Upvotes: 0
Reputation: 36839
You should be able to invert your logic:
def func(group):
if len(group) > 1:
group.loc[group['test'] == 1, 'check'] = 1
return group
df = df.groupby(['ID', 'year']).apply(func)
Upvotes: 1