curtiplas
curtiplas

Reputation: 158

How to strip/replace "domain\" from Pandas DataFrame Column?

I have a pandas DataFrame that's being read in from a CSV that has hostnames of computers including the domain they belong to along with a bunch of other columns. I'm trying to strip out the Domain information such that I'm left with ONLY the Hostname.

DataFrame ex:

name
domain1\computername1
domain1\computername45
dmain3\servername1
dmain3\computername3
domain1\servername64
....

I've tried using both str.strip() and str.replace() with a regex as well as a string literal, but I can't seem to correctly target the domain information correctly.

Examples of what I've tried thus far:

df['name'].str.strip('.*\\')

df['name'].str.replace('.*\\', '', regex = True)

df['name'].str.replace(r'[.*\\]', '', regex = True)

df['name'].str.replace('domain1\\\\', '', regex = False)
df['name'].str.replace('dmain3\\\\', '', regex = False)

None of these seem to make any changes when I spit the DataFrame out using logging.debug(df)

Upvotes: 0

Views: 433

Answers (3)

Ryszard Czech
Ryszard Czech

Reputation: 18621

No regex approach with ntpath.basename:

import pandas as pd
import ntpath
df = pd.DataFrame({'name':[r'domain1\computername1']})
df["name"] = df["name"].apply(lambda x: ntpath.basename(x))

Results: computername1.

With rsplit:

df["name"] = df["name"].str.rsplit('\\').str[-1]

Upvotes: 0

SeaBean
SeaBean

Reputation: 23217

You are already close to the answer, just use:

df['name'] = df['name'].str.replace(r'.*\\', '', regex = True)

which just adds using r-string from one of your tried code.

Without using r-string here, the string is equivalent to .*\\ which will be interpreted to only one \ in the final regex. However, with r-string, the string will becomes '.*\\\\' and each pair of \\ will be interpreted finally as one \ and final result becomes 2 slashes as you expect.

Output:

0     computername1
1    computername45
2       servername1
3     computername3
4      servername64
Name: name, dtype: object

Upvotes: 1

Andrej Kesely
Andrej Kesely

Reputation: 195543

You can use .str.split:

df["name"] = df["name"].str.split("\\", n=1).str[-1]
print(df)

Prints:

             name
0   computername1
1  computername45
2     servername1
3   computername3
4    servername64

Upvotes: 0

Related Questions