Reputation: 954
I want to insert np.nan
to a dataframe
; one nan
for each row in random positions.
This is my dataframe
:
list_cols= ['col01', 'col02', 'col03', 'col04', 'col05','col06', 'col07', 'col08', 'col09', 'col10','col11', 'col12', 'col13', 'col14', 'col15', 'col16']
X_full = pd.DataFrame(np.random.uniform(low=1.0, high=100.0, size=(5,16)), columns=list(list_cols))
This is my code:
# Add a single nan value to each row
rng = np.random.RandomState(0)
n_samples, n_features = X_full.shape
X_missing = X_full.copy()
missing_samples = np.arange(n_samples)
missing_features = rng.choice(n_features, n_samples, replace=True)
X_missing[missing_samples, missing_features] = np.nan
It returns TypeError: unhashable type: 'numpy.ndarray'
.
Thanks for help.
Upvotes: 0
Views: 180
Reputation: 2417
you could do
X_missing = X_full.copy()
indexes = np.random.choice(range(X_missing.shape[1]), X_missing.shape[0])
X_missing.values[range(X_missing.shape[0]), indexes] = np.nan
Upvotes: 1
Reputation: 375
Since I'm not 100% sure I understand your question correctly, in case you just want to change the value of a single cell into NaN (i.e. values at (0, 12) & (1, 7) etc should be nan) then you can use:
for row, column in zip(missing_samples, missing_features):
X_missing.iat[row, column] = np.nan
Note that we use a method starting with an 'i': .iat
. This means the change we are applying is based on an Index and not on a row/column name.
In your example you get an error because it interprets missing_samples
and missing_features
as row names and column names, but the values you gave were number (namely the indices). You could use .iloc
to clarify that what you are giving is an index and not a name but then it will be replacing whole rows, that's why I use .iat
because I assume you only want to replace values 'at' specific 'i ndices'. Hope this helps
Upvotes: 1