Reputation: 880
I want to create augmented data in a new dataframe for every row of an original dataframe.
So, I've defined augment method which I want to use in apply as following:
def augment(row: pd.Series, column_name: str, target_df: pd.DataFrame, num_samples: int):
# print(type(row))
target_df_start_index = target_df.shape[0]
raw_img = row[column_name].astype('uint8')
bin_image = convert_image_to_binary_image(raw_img)
bin_3dimg = tf.expand_dims(input=bin_image, axis=2)
bin_img_reshaped = tf.image.resize_with_pad(image=bin_3dimg, target_width=128, target_height=128, method="bilinear")
for i in range(num_samples + 1):
new_row = row.copy(deep=True)
if i == 0:
new_row[column_name] = np.squeeze(bin_img_reshaped, axis=2)
else:
aug_image = data_augmentation0(bin_img_reshaped)
new_row[column_name] = np.squeeze(aug_image, axis=2)
# display.display(new_row)
target_df.loc[target_df_start_index + i] = new_row
# print(target_df.shape)
# display.display(target_df)
When I call this as following, everything works:
tmp_df = pd.DataFrame(None, columns=testDF.columns)
augment(testDF.iloc[0], column_name='binMap', target_df=tmp_df, num_samples=4)
augment(testDF.iloc[1], column_name='binMap', target_df=tmp_df, num_samples=4)
However, when I try it using 'apply' method, I get the prints or the display working fine but the resultant dataframe shows error
tmp_df = pd.DataFrame(None, columns=testDF.columns)
testDF.apply(augment, args=('binMap', tmp_df, 4, ), axis=1)
This is how the o/p data looks like after the apply call -
,data
<Error>, <Error>
<Error>, <Error>
What am I doing wrong?
Upvotes: 1
Views: 38
Reputation: 20425
Your test is very nice, thank you for the clear exposition. I am happy to be your rubber duck.
In test A, you (successfully) mess with
testDF.iloc[0]
and [1]
,
using kind of a Fortran-style API
for augment(), leaving a side effect result in tmp_df.
Test B is carefully constructed to
be "the same" except for the .apply()
call.
So let's see, what's different?
Hard to say.
Let's go examine the docs.
Oh, right! We're using the .apply() API, so we'd better follow it. Down at the end it explains:
Returns: Series or DataFrame
Result of applying func along the given axis of the DataFrame.
But you're offering return None
instead.
Now, I'm not here to pass judgement on
whether it's best to have side effects
on a target df
-- that's up to you.
But .apply() will be bent out of shape
until you give it something nice
to store as its own result.
Happy hunting!
Tiny little style nit.
You wrote
args=('binMap', tmp_df, 4, )
to offer a 3-tuple. Better to write
args=('binMap', tmp_df, 4)
As written it tends to suggest 1-tuple notation.
When is trailing comma helpful?
x = (7,)
fruits = [
'apple',
'banana',
]
Upvotes: 1
Reputation: 880
This change worked for me -
def augment(row: pd.Series, column_name: str, target_df: pd.DataFrame, num_samples: int) -> pd.Series:
# print(type(row))
target_df_start_index = target_df.shape[0]
raw_img = row[column_name].astype('uint8')
bin_image = convert_image_to_binary_image(raw_img)
bin_3dimg = tf.expand_dims(input=bin_image, axis=2)
bin_img_reshaped = tf.image.resize_with_pad(image=bin_3dimg, target_width=128, target_height=128, method="bilinear")
for i in range(num_samples + 1):
new_row = row.copy(deep=True)
if i == 0:
new_row[column_name] = np.squeeze(bin_img_reshaped, axis=2)
else:
aug_image = data_augmentation0(bin_img_reshaped)
new_row[column_name] = np.squeeze(aug_image, axis=2)
# display.display(new_row)
target_df.loc[target_df_start_index + i] = new_row
# print(target_df.shape)
# display.display(target_df)
return row
And updated call to apply as following:
testDF = testDF.apply(augment, args=('binMap', tmp_df, 4, ), result_type='broadcast', axis=1)
Thank you @J_H. If there are better to way to achieve what I'm doing, please feel free to suggest the improvements.
Upvotes: 0