Reputation: 115

iterate in all Dataframe rows and perform startswith()

my df look like this:

as you can see the User starts with 'ff' and it could be in access column or any other column rather than user column.

i want to create a new column in this df called "UserID" where whenever the is 'ff' in all the columns copy this value to my new column "UserId"

i have been using this method which is working fine but i have to repeat this line in all the columns:

hist.loc[hist.User.str.startswith("ff",na=False),'UserId']=hist['User'].str[2:]

is there any other method i can use to loop over all rows at once?

thanks

Upvotes: 1

Answers (3)

Arty

Reputation: 16747

NumPy + Pandas solution below.

In case of ambiguity (several ff-strings in a row) leftmost occurance is taken. In case of absence (no ff-string in a row) NaN value is used.

Try it online!

import pandas as pd, numpy as np

df = pd.DataFrame({
    'user':    ['fftest',     'fwadmin',     'fshelpdesk3', 'no',   'ffone'],
    'access':  ['fwadmin',    'ffuser2',     'fwadmin',     'user', 'fftwo'],
    'station': ['fshelpdesk', 'fshelpdesk2', 'ffuser3',     'here', 'three'],
})

sv = df.values.astype(np.str)
ix = np.argwhere(np.char.startswith(sv, 'ff'))[::-1].T
df.loc[ix[0], 'UserID'] = pd.Series(sv[(ix[0], ix[1])]).str[2:].values

print(df)

Output:

          user   access      station UserID
0       fftest  fwadmin   fshelpdesk   test
1      fwadmin  ffuser2  fshelpdesk2  user2
2  fshelpdesk3  fwadmin      ffuser3  user3
3           no     user         here    NaN
4        ffone    fftwo        three    one

Upvotes: 0

Fjord

Reputation: 31

Hey here is my attempt at solving the problem, hope it helps.

d = df[df.apply(lambda x: x.str.startswith('ff'))]
df['user_id'] = d['user'].fillna(d['access'].fillna(d['station']))

Result

        user   access      station  user_id
0       fftest  fwadmin   fshelpdesk   fftest
1      fwadmin  ffuser2  fshelpdesk2  ffuser2
2  fshelpdesk3  fwadmin      ffuser3  ffuser3

Upvotes: 0

Marcin

Reputation: 1391

If you are cool with picking only the first occurence:

df['UserID'] = df.apply(lambda x: x[x.str.startswith('ff')][:1], axis=1)

Upvotes: 1

iterate in all Dataframe rows and perform startswith()

Answers (3)

Related Questions