Reputation: 23099
apologies my multiple attempts looking through SO have been in vain. I need some assistance with the below:
I have a df as follows :
import pandas as pd
df = pd.DataFrame({'subject' : ['Eng' ,'Math', 'Science'], 'role' : ['Teacher', '', 'Dr'], 'ID' : ['123', '444', 'NaN'] })
print(df)
ID role subject
0 123 Teacher Eng
1 444 Math
2 NaN Dr Science
As this is potentially unvalidated data, what I would usually do (in excel boo) is to create a memo field which aggregates the data and does not potentially damage any good data on upload.
so I've been toying with itterrows to try to loop through the df and merge the fields into one. The caveat being that I need to use /n to separate the values in the same column (CHAR(10) for you excel buffs)
so I've tried..
for index, row in df2.iterrows():
if row['ID'] and row['subject] and row['role'] == "": #search row for any data while leaving out blanks
row['C_MEMO'] = row[['ID', 'subject', 'role']]
else:
pass
but this isn't really working, any advice/help would be appreciated.
my desired output:
ID role subject C_MEMO
0 123 Teacher Eng 123,
Teacher
Eng
1 444 Math 444,
Math
2 NaN Dr Science Dr,
Science
things to consider:
I'm working with multiple data types.
I'm happy to have the new output as a string.
Upvotes: 3
Views: 93
Reputation: 164823
You can use a list comprehension with str.join
:
bad_vals = {'', 'Nan'}
df['C_Memo'] = [', '.join([i for i in row if i not in bad_vals]) for row in df.values]
print(df)
ID role subject C_Memo
0 123 Teacher Eng 123, Teacher, Eng
1 444 Math 444, Math
2 NaN Dr Science Dr, Science
This is not necessarily worse than pd.DataFrame.apply
, which has overheads associated with iterating over pd.Series
objects for each row.
Upvotes: 1
Reputation: 863531
Use:
#first replace values
df = df.replace(['', 'NaN'], np.nan)
df["C_MEMO"] = df.apply(lambda x: '\n'.join(x.dropna()), axis=1)
Or if missing values are strings only:
df["C_MEMO"] = df.apply(lambda x: '\n'.join(x[~x.isin(['', 'NaN'])]), axis=1)
print(df)
subject role ID C_MEMO
0 Eng Teacher 123 Eng\nTeacher\n123
1 Math NaN 444 Math\n444
2 Science Dr NaN Science\nDr
Upvotes: 2
Reputation: 82795
import pandas as pd
df = pd.DataFrame({'subject' : ['Eng' ,'Math', 'Science'], 'role' : ['Teacher', '', 'Dr'], 'ID' : ['123', '444', 'NaN'] })
df["C_MEMO"] = (df["ID"] + "," + df["role"] + "," + df["subject"]).str.replace(",,", ",")
print(df)
Output:
ID role subject C_MEMO
0 123 Teacher Eng 123,Teacher,Eng
1 444 Math 444,Math
2 NaN Dr Science NaN,Dr,Science
Or using df.apply
df["C_MEMO"] = df.apply(lambda x: ", ".join([x["ID"], x["role"], x["subject"]]).replace(", ,", ","), axis=1)
Upvotes: 3