Victor Castro
Victor Castro

Reputation: 105

filtering a dataframe if the length of the word inside the series > 3

Community! I really appreciate all support I'm receiving through my journey learning python so far!

I got this following dataframe:

d = {'name': ['john', 'mary', 'james'], 'area':[['IT', 'Resources', 'Admin'], ['Software', 'ITS', 'Programming'], ['Teaching', 'Research', 'KS']]}
df = pd.DataFrame(data=d)

enter image description here

My goal is:

enter image description here

In other words, if the length of word inside the list of the column 'area' > 3, remove them.

I'm trying something like this but I´m really stuck

enter image description here

What is the best way of approaching this situation?

Thanks again!!

Upvotes: 2

Views: 39

Answers (3)

Andreas
Andreas

Reputation: 9197

Combine .map with list comprehension:

df['area'] = df['area'].map(lambda x: [e for e in x if len(e)>3])

0         [Resources, Admin]
1    [Software, Programming]
2       [Teaching, Research]

explaination:

x = ["Software", "ABC", "Programming"]
# return e for every element in x but only if length of element is larger than 3
[e for e in x if len(e)>3]

Upvotes: 1

LazyAnalyst
LazyAnalyst

Reputation: 476

Before you build the dataframe.

One simple and efficient way is to create a new list of the key: "area", which will contain only strings with length bigger than 3. For example:

d = {'name': ['john', 'mary', 'james'], 'area':['IT', 'Resources', 'Admin'], ['Software', 'ITS', 'Programming'], ['Teaching', 'Research', 'KS']]}

# Retrieving the areas from d.
area_list = d['area']

# Copying all values, whose length is larger than 3, in a new list.
filtered_area_list = [a in area_list if len(3) > 3]

# Replacing the old list in the dictionary with the new one.
d['area'] = filtered_area_list

# Creating the dataframe.
df = pd.DataFrame(data=d)

After you build the dataframe.

If your data is in a dataframe, then you can use the "map" function:

df['area'] = df['area'].map(lambda a: [e for e in a if len(e) > 3])

Upvotes: 1

Alex
Alex

Reputation: 7045

You can expand all your lists, filter on str length and then put them back in lists by aggregating using list:

df = df.explode("area")
df = df[df["area"].str.len() > 3].groupby("name", as_index=False).agg(list)
#     name                     area
# 0  james     [Teaching, Research]
# 1   john       [Resources, Admin]
# 2   mary  [Software, Programming]

Upvotes: 1

Related Questions