user58519
user58519

Reputation: 629

How to get some string of dataframe column?

I have dataframe like this.

print(df)

[    ID   ...   Control
0  PDF-1  ...     NaN
1  PDF-3  ...     NaN
2  PDF-4  ...     NaN

I want to get only number of ID column. So the result will be.

1
3
4

How to get one of the strings of the dataframe column ?

Upvotes: 0

Views: 193

Answers (4)

Brandon Campbell
Brandon Campbell

Reputation: 67

Find "PDF-" ,and replace it with nothing df['ID'] = df['ID'].str.replace('PDF-', '') Then to print how you asked I'd convert the data frame to a string with no index. print df['cleanID'].to_string(index=False)

Upvotes: 0

David
David

Reputation: 8298

Another possibility using Regex is:

df.ID.str.extract('(\d+)')

This avoids changing the original data just to extract the integers.

So for the following simple example:

import pandas as pd

df = pd.DataFrame({'ID':['PDF-1','PDF-2','PDF-3','PDF-4','PDF-5']})
print(df.ID.str.extract('(\d+)'))
print(df)

we get the following:

   0
0  1
1  2
2  3
3  4
4  5

   ID
0  PDF-1
1  PDF-2
2  PDF-3
3  PDF-4
4  PDF-5

Upvotes: 0

RavinderSingh13
RavinderSingh13

Reputation: 133528

Could you please try following.

df['ID'].replace(regex=True,to_replace=r'([^\d])',value=r'')

One could refer documentation for df.replace

Basically using regex to remove everything apart from digits in column named ID where \d denotes digits and when we use [^\d] means apart form digits match everything.

Upvotes: 1

RomanPerekhrest
RomanPerekhrest

Reputation: 92854

How about just replace a common PDF- prefix?

df['ID'].str.replace('PDF-', '')

Upvotes: 1

Related Questions