Reputation: 301
I have a dataframe containing devices and their corresponding firmware versions (e.g. 1.7.1.3). I'm trying to shorten the firmware version to only show three numbers (e.g. 1.7.1).
I know how to do this on a single string but how would I make it efficient for a large dataframe?
test = "1.2.3.4"
test = test.split(".")
'.'.join(test[0:-1])
Upvotes: 1
Views: 2297
Reputation: 785098
Here is one more way of doing it using .replace
:
import pandas as pd
df = pd.DataFrame({'data': {0: '1.2.3.4', 1: '1.2.3.9', 2: '1.2.3.8'}})
df['data'] = df['data'].str.replace(r'\.[^.]*$', '')
print (df['data'])
Output:
0 1.2.3
1 1.2.3
2 1.2.3
Name: data, dtype: object
.replace(r'\.[^.]*$', '')
matches last dot and text after that, which is replaced with an empty string.
Upvotes: 2
Reputation: 133458
This could be done by extract
function of pandas too, could you please try following.
df['data'] = df['data'].str.extract(r'^(\d+(?:\.\d+){2})', expand=True)
Simple explanation would be: using extract
function of Pandas and mentioning regex in it to catch only first 3 digits as per OP's need.
Taking example of DataFrame used by Anurag Dabas here:
Let's say df is following:
data
0 1.2.3.4
1 1.2.3.9
2 1.2.3.8
After running above code it will become like:
data
0 1.2.3
1 1.2.3
2 1.2.3
Upvotes: 2
Reputation: 24314
#sample dataframe:
import pandas as pd
df=pd.DataFrame({'data': {0: '1.2.3.4', 1: '1.2.3.9', 2: '1.2.3.8'}})
For this you can use:
df['data']=df['data'].str.split('.').str[0:3].apply('.'.join)
OR
df['data']=df['data'].str[0:5]
OR
df['data']=df['data'].str[::-1].str.split('.',1).str[1].str[::-1]
Performance:
Upvotes: 5