Sas3
Sas3

Reputation: 15

Pandas dataframe row data filtering

I have a column of data in pandas dataframe in Bxxxx-xx-xx-xx.y format. Only the first part (Bxxxx) is all I require. How do I split the data? In addition, I also have data in BSxxxx-xx-xx-xx format in the same column which I would like to remove using regex='^BS' command (For some reason, it's not working). Any help in this regard will be appreciated.BTW, I am using df.filter command.

Upvotes: 0

Views: 169

Answers (4)

Rohith
Rohith

Reputation: 1048

This should work.

df[df.col1.apply(lambda x: x.split("-")[0][0:2]!="BS")].col1.apply(lambda x: x.split("-")[0])

Upvotes: 1

harpan
harpan

Reputation: 8641

Consider below example:

df = pd.DataFrame({
    'col':['B123-34-gd-op','BS01010-9090-00s00','B000003-3frdef4-gdi-ortp','B1263423-304-gdcd-op','Bfoo3-poo-plld-opo', 'BSfewf-sfdsd-cvc']
})
print(df)

Output:

    col
0   B123-34-gd-op
1   BS01010-9090-00s00
2   B000003-3frdef4-gdi-ortp
3   B1263423-304-gdcd-op
4   Bfoo3-poo-plld-opo
5   BSfewf-sfdsd-cvc

Now Let's do two tasks:

  1. Extract Bxxxx part from Bxxx-xx-xx-xxx .
  2. Remove BSxxx formated strings.

Consider below code which uses startswith():

df[~df.col.str.startswith('BS')].col.str.split('-').str[0]

Output:

0        B123
2     B000003
3    B1263423
4       Bfoo3
Name: col, dtype: object

Breakdown:

df[~df.col.str.startswith('BS')] gives us all the string which do not start with BS. Next, We are spliting those string with - and taking the first part with .col.str.split('-').str[0] .

Upvotes: 0

Yilun Zhang
Yilun Zhang

Reputation: 9018

A one-liner solution would be:

df["column_name"] = df["column_name"].apply(lambda x: x[:5])

Upvotes: 0

drew_psy
drew_psy

Reputation: 105

You can define a function where in you treat Bxxxx-xx-xx-xx.y as a string and just extract the first 5 indexes.

    >>> def edit_entry(x):
    ...     return (str(x)[:5])
    >>> df['Column_name'].apply(edit_entry)

Upvotes: 0

Related Questions