Reputation: 1680
I am trying to remove all values in this pandas dataframe that have that have less than length 3, but not to all columns
import pandas
df = pd.DataFrame({'id': [1, 2, 3],'player': ['w', 'George', 'Roland'], 'hometown': ['Miami', 'Caracas', 'Mexico City'], 'current_city': ['New York', '-', 'New York']})
columns_to_add = ['player', 'hometown', 'current_city']
for column_name in columns_to_add:
df.loc[(len(df[column_name]) < 3), column_name] = None
I am trying the following code but I get the following error:
KeyError("cannot use a single bool to index into setitem")
Note:
Upvotes: 1
Views: 687
Reputation: 1680
The answer to the issue that took in consideration all variables correctly was the following:
import pandas
import numpy as np
df0 = pd.DataFrame({'id': [1, 2, 3],'player': ['w', 'George', 'Roland'], 'hometown': ['Miami', 'Caracas', 'Mexico City'], 'current_city': ['New York', '-', 'New York']})
columns_to_add = ['player', 'hometown', 'current_city']
df0[df0[columns_to_add].apply(lambda col: col.str.len() < 3)] = np.nan
df = df0.where(pandas.notnull(df0), None)
One important thing to understand is that columns_to_add does not include all the columns in the dataframe
Upvotes: 0
Reputation: 884
I think the simplest solution might be
new_df = df[columns_to_add]
new_df[new_df.applymap(len) > 3]
Upvotes: 0
Reputation: 180
you can use the 'replace' function in DataFrame :
def find_string_less_lenth(list_of_values):
return [i for i in list_of_values if len(i)<3]
for column_name in columns_to_add:
df[column_name] = \
df[column_name].replace(find_string_less_lenth(df[column_name].values), 'none')
Upvotes: 0
Reputation: 150735
You can use applymap
to calculate the length, then np.where
to update:
df[columns_to_add] = np.where(df[columns_to_add].applymap(len) >=3,
df[columns_to_add], None)
Output:
id player hometown current_city
0 1 None Miami New York
1 2 George Caracas None
2 3 Roland Mexico City New York
Upvotes: 1
Reputation:
Try this:
df[df[columns_to_add].apply(lambda col: col.str.len() < 3)] = np.nan
Output:
>>> df
id player hometown current_city
0 1 NaN Miami New York
1 2 George Caracas NaN
2 3 Roland Mexico City New York
Upvotes: 1