tyleroki
tyleroki

Reputation: 301

String splitting and joining on a pandas dataframe

I have a dataframe containing devices and their corresponding firmware versions (e.g. 1.7.1.3). I'm trying to shorten the firmware version to only show three numbers (e.g. 1.7.1).

I know how to do this on a single string but how would I make it efficient for a large dataframe?

test = "1.2.3.4"
test = test.split(".")
'.'.join(test[0:-1])

Upvotes: 1

Views: 2297

Answers (3)

anubhava
anubhava

Reputation: 785098

Here is one more way of doing it using .replace:

import pandas as pd
df = pd.DataFrame({'data': {0: '1.2.3.4', 1: '1.2.3.9', 2: '1.2.3.8'}})
df['data'] = df['data'].str.replace(r'\.[^.]*$', '')
print (df['data'])

Output:

0    1.2.3
1    1.2.3
2    1.2.3
Name: data, dtype: object

.replace(r'\.[^.]*$', '') matches last dot and text after that, which is replaced with an empty string.

Upvotes: 2

RavinderSingh13
RavinderSingh13

Reputation: 133458

This could be done by extract function of pandas too, could you please try following.

df['data'] = df['data'].str.extract(r'^(\d+(?:\.\d+){2})', expand=True)

Simple explanation would be: using extract function of Pandas and mentioning regex in it to catch only first 3 digits as per OP's need.



Taking example of DataFrame used by Anurag Dabas here:

Let's say df is following:

    data
0   1.2.3.4
1   1.2.3.9
2   1.2.3.8

After running above code it will become like:

    data
0   1.2.3
1   1.2.3
2   1.2.3

Upvotes: 2

Anurag Dabas
Anurag Dabas

Reputation: 24314

#sample dataframe:
import pandas as pd
df=pd.DataFrame({'data': {0: '1.2.3.4', 1: '1.2.3.9', 2: '1.2.3.8'}})

For this you can use:

df['data']=df['data'].str.split('.').str[0:3].apply('.'.join)

OR

df['data']=df['data'].str[0:5]

OR

df['data']=df['data'].str[::-1].str.split('.',1).str[1].str[::-1]

Performance:

enter image description here

Upvotes: 5

Related Questions