Reputation:
I have a df something like this,
df
a b c d e f
0 Banana Orange Lychee Custardapples Jackfruit Pineapple
1 Apple Pear Strawberry Muskmelon Apricot Peach
2 Raspberry Cherry Plum Kiwi Mango Blackberry
I want to remove a single value from each column randomly.
Eg:
a b c d e f
0 Banana Orange Custardapples Jackfruit
1 Pear Strawberry Apricot Peach
2 Raspberry Plum Kiwi Blackberry
Upvotes: 1
Views: 488
Reputation: 42886
Use the pandas
built in method Series.sample
with n=1
argument. I replace the values with NaN
since that's more elegant:
for col in df.columns:
df.loc[df[col].sample(n=1).index, col] = np.NaN
a b c d e f
0 NaN NaN Lychee Custardapples Jackfruit Pineapple
1 Apple Pear NaN Muskmelon Apricot Peach
2 Raspberry Cherry Plum NaN NaN NaN
If you actually want whitespaces instead, replace np.NaN
with ''
Upvotes: 1
Reputation: 36604
You can use random x, y
coordinates and set them to ""
:
for i in range(df.shape[1]):
df.iloc[np.random.randint(df.shape[0]), i] = ""
Full code:
import pandas as pd
import numpy as np
df = pd.read_clipboard()
print(df)
a b c d e f
0 Banana Orange Lychee Custardapples Jackfruit Pineapple
1 Apple Pear Strawberry Muskmelon Apricot Peach
2 Raspberry Cherry Plum Kiwi Mango Blackberry
for loop for all columns:
for i in range(df.shape[1]):
df.iloc[np.random.randint(df.shape[0]), i] = ""
a b c d e f
0 Orange Lychee Custardapples Jackfruit Pineapple
1 Apple Muskmelon Apricot
2 Raspberry Cherry Plum Blackberry
Upvotes: 2