Reputation: 405
I would like to know a better way to append information to a dataframe while in a loop. Specifically, to add COLUMNS of information to a dataframe in a conditional manner. The code below technically works, but other than the fact that it is sloppy, more importantly, information such as the data type in each cell is lost as everything is converted to a string. Any tips would be great.
raw_data = {'first_name': ['John', 'Molly', 'Tina', 'Jake', 'Amy'],
'last_name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze'],
'age': [42, 20, 16, 24, '']}
df = pd.DataFrame(raw_data, columns = ['first_name', 'last_name', 'age'])
headers = df.columns.values
count = 0
for index, row in df.iterrows():
count += 1
if row['age'] > 18:
adult = True
else:
adult = False
headers = np.append(headers,'ADULT')
vals = np.append(row.values,adult)
if count == 1:
print ','.join(headers.tolist())
print str(vals.tolist()).replace('[','').replace(']','').replace("'","")
else:
print str(vals.tolist()).replace('[','').replace(']','').replace("'","")
Upvotes: 2
Views: 225
Reputation: 51425
This seems to give your desired outcome (at least, it's the same outcome as your loop):
df['ADULT'] = np.where(pd.to_numeric(df.age) > 18, True, False)
>>> df
first_name last_name age ADULT
0 John Miller 42 True
1 Molly Jacobson 20 True
2 Tina Ali 16 False
3 Jake Milner 24 True
4 Amy Cooze False
As pointed out by @Wen, this is much more straightforward:
df['ADULT'] = pd.to_numeric(df.age) > 18
Upvotes: 2