PeterB
PeterB

Reputation: 2414

Remove certain strings from list of strings as column in pandas.DataFrame

I have a pandas.DataFrame:

    index    question_id    tag
    0        1858           [pset3, game-of-fifteen]
    1        2409           [pset4]
    2        4346           [pset6, cs50submit]
    3        9139           [pset8, pset5, gradebook]
    4        9631           [pset4, recover]

I need to remove every string from list of strings in tag column except pset* strings.

So I need to end with something like this:

    index    question_id    tag
    0        1858           [pset3]
    1        2409           [pset4]
    2        4346           [pset6]
    3        9139           [pset8, pset5]
    4        9631           [pset4]

How can I do that please?

Upvotes: 1

Views: 1590

Answers (3)

Vaishali
Vaishali

Reputation: 38415

You can even use python in operator

df.tag = df.tag.apply(lambda x: [elem for elem in x if 'pset' in elem])

0           [pset3]
1           [pset4]
2           [pset6]
3    [pset8, pset5]
4           [pset4]

Upvotes: 2

James
James

Reputation: 36608

You can apply a function to the tag series that constructs a list using only the elements that start with 'pset'

df.tag.apply(lambda x: [xx for xx in x if xx.startswith('pset')])

# returns:
0           [pset3]
1           [pset4]
2           [pset6]
3    [pset8, pset5]
4           [pset4]

Upvotes: 2

akuiper
akuiper

Reputation: 214957

One option: Use apply method to loop through the items in the tag column; for each item, use a list comprehension to filter strings based on the prefix using startswith method:

df['tag'] = df.tag.apply(lambda lst: [x for x in lst if x.startswith("pset")])
df

enter image description here

Upvotes: 2

Related Questions