Reputation: 548
I am attempting to get a Non null value for the LastName, but I am receiving an error. How can I resolve this? p.s I have 20 millions rows.
dataframe:
FirstName Middle LastName
Tom Ju NaN
Kity NaN Rob
my attemp:
for row in df:
if row['LastName'].isnull() == True:
row['real_lastName'] = row['Middle']
else:
row['real_lastName'] = row['LastName']
i have the following error
TypeError: string indices must be integers
Upvotes: 1
Views: 13220
Reputation: 12417
Another option:
df["real_lastName"] = df['middle'].replace(np.NaN, '') + df['last_name'].replace(np.NaN, '')
Upvotes: 0
Reputation: 862691
Use numpy.where
:
df['real_lastName'] = np.where(df['LastName'].isnull(), df['Middle'], df['LastName'] )
print (df)
FirstName Middle LastName real_lastName
0 Tom Ju NaN Ju
1 Kity NaN Rob Rob
Another possible solution is use fillna
or combine_first
:
df['real_lastName'] = df['LastName'].fillna(df['Middle'])
df['real_lastName'] = df['LastName'].combine_first(df['Middle'])
Performance is similar:
#[200000 rows x 4 columns]
df = pd.concat([df] * 100000, ignore_index=True)
In [41]: %timeit df['real_lastName'] = np.where(df['LastName'].isnull(), df['Middle'], df['LastName'] )
13.3 ms ± 51.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [42]: %timeit df['real_lastName'] = df['LastName'].fillna(df['Middle'])
16.2 ms ± 58.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [43]: %timeit df['real_lastName'] = df['LastName'].combine_first(df['Middle'])
13 ms ± 100 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Upvotes: 4