Reputation: 1680

Python, pandas: Removing values with a small length in dataframe

I am trying to remove all values in this pandas dataframe that have that have less than length 3, but not to all columns

import pandas 

df = pd.DataFrame({'id': [1, 2, 3],'player': ['w', 'George', 'Roland'], 'hometown': ['Miami', 'Caracas', 'Mexico City'], 'current_city': ['New York', '-', 'New York']})

columns_to_add = ['player', 'hometown', 'current_city']

for column_name in columns_to_add:
    df.loc[(len(df[column_name]) < 3), column_name] = None

I am trying the following code but I get the following error:

KeyError("cannot use a single bool to index into setitem")

Note:

Upvotes: 1

Answers (5)

The Dan

Reputation: 1680

The answer to the issue that took in consideration all variables correctly was the following:

import pandas
import numpy as np

df0 = pd.DataFrame({'id': [1, 2, 3],'player': ['w', 'George', 'Roland'], 'hometown': ['Miami', 'Caracas', 'Mexico City'], 'current_city': ['New York', '-', 'New York']})

columns_to_add = ['player', 'hometown', 'current_city']


df0[df0[columns_to_add].apply(lambda col: col.str.len() < 3)] = np.nan
df = df0.where(pandas.notnull(df0), None)

One important thing to understand is that columns_to_add does not include all the columns in the dataframe

Upvotes: 0

Nicolai B. Thomsen

Reputation: 884

I think the simplest solution might be

new_df = df[columns_to_add]
new_df[new_df.applymap(len) > 3]

Upvotes: 0

Marya

Reputation: 180

you can use the 'replace' function in DataFrame :

def find_string_less_lenth(list_of_values):
    return [i for i in list_of_values if len(i)<3]
for column_name in columns_to_add:
    df[column_name] = \
df[column_name].replace(find_string_less_lenth(df[column_name].values), 'none')

Upvotes: 0

Quang Hoang

Reputation: 150735

You can use applymap to calculate the length, then np.where to update:

df[columns_to_add] = np.where(df[columns_to_add].applymap(len) >=3, 
                              df[columns_to_add], None)

Output:

   id  player     hometown current_city
0   1    None        Miami     New York
1   2  George      Caracas         None
2   3  Roland  Mexico City     New York

Upvotes: 1

user17242583

Reputation:

Try this:

df[df[columns_to_add].apply(lambda col: col.str.len() < 3)] = np.nan

Output:

>>> df
   id  player     hometown current_city
0   1     NaN        Miami     New York
1   2  George      Caracas          NaN
2   3  Roland  Mexico City     New York

Upvotes: 1

Python, pandas: Removing values with a small length in dataframe

Answers (5)

Related Questions