Reputation: 105
Community! I really appreciate all support I'm receiving through my journey learning python so far!
I got this following dataframe:
d = {'name': ['john', 'mary', 'james'], 'area':[['IT', 'Resources', 'Admin'], ['Software', 'ITS', 'Programming'], ['Teaching', 'Research', 'KS']]}
df = pd.DataFrame(data=d)
My goal is:
In other words, if the length of word inside the list of the column 'area' > 3, remove them.
I'm trying something like this but I´m really stuck
What is the best way of approaching this situation?
Thanks again!!
Upvotes: 2
Views: 39
Reputation: 9197
Combine .map
with list comprehension:
df['area'] = df['area'].map(lambda x: [e for e in x if len(e)>3])
0 [Resources, Admin]
1 [Software, Programming]
2 [Teaching, Research]
explaination:
x = ["Software", "ABC", "Programming"]
# return e for every element in x but only if length of element is larger than 3
[e for e in x if len(e)>3]
Upvotes: 1
Reputation: 476
One simple and efficient way is to create a new list of the key: "area", which will contain only strings with length bigger than 3. For example:
d = {'name': ['john', 'mary', 'james'], 'area':['IT', 'Resources', 'Admin'], ['Software', 'ITS', 'Programming'], ['Teaching', 'Research', 'KS']]}
# Retrieving the areas from d.
area_list = d['area']
# Copying all values, whose length is larger than 3, in a new list.
filtered_area_list = [a in area_list if len(3) > 3]
# Replacing the old list in the dictionary with the new one.
d['area'] = filtered_area_list
# Creating the dataframe.
df = pd.DataFrame(data=d)
If your data is in a dataframe, then you can use the "map" function:
df['area'] = df['area'].map(lambda a: [e for e in a if len(e) > 3])
Upvotes: 1
Reputation: 7045
You can expand all your lists, filter on str
length and then put them back in lists by aggregating using list
:
df = df.explode("area")
df = df[df["area"].str.len() > 3].groupby("name", as_index=False).agg(list)
# name area
# 0 james [Teaching, Research]
# 1 john [Resources, Admin]
# 2 mary [Software, Programming]
Upvotes: 1