popeye
popeye

Reputation: 291

How to replace directory path in each row of a column with pandas?

I have a python dataframe which has a filename column that looks like this:

Filename
/var/www/html/projects/Bundesliga/Match1/STAR_SPORTS_2-20170924-200043-210917-00001.jpg
/var/www/html/projects/Bundesliga/Match1/STAR_SPORTS_2-20170924-200043-210917-00001.jpg

From the Filename column I want to replace the directory name with a new destination directory name.

dst = "/home/mycomp/Images'

I have tried the following:

df['Filename'] = df['Filename'].str.replace(os.path.dirname(df['Filename']), dst)

But I am getting the following error.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/posixpath.py", line 129, in dirname
    i = p.rfind('/') + 1
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 3614, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'Series' object has no attribute 'rfind'

Upvotes: 1

Views: 3843

Answers (3)

jpp
jpp

Reputation: 164693

Here is one way using regular expression.

import os, re

dst = r'/home/mycomp/Images'

paths = '|'.join([re.escape(s) for s in set(df['Filename'].map(os.path.dirname))])

df['Filename'] = df['Filename'].str.replace(paths, dst)

#                                             Filename
# 0  /home/mycomp/Images/STAR_SPORTS_2-20170924-200...
# 1  /home/mycomp/Images/STAR_SPORTS_2-20170924-200...

Explanation

  • Extract all directories, escape special characters, and combine into a single string separated by | [regex or]. This ensures all paths in the series are replaced.
  • Use os.path.dirname to extract the correct path across platforms.
  • Use pd.Series.str.replace with regex to replace all paths with dst input.

Upvotes: 0

tgrandje
tgrandje

Reputation: 2534

df['Filename'] = df['Filename'].apply(lambda x: x.replace(os.path.dirname(x), dst))

Upvotes: 5

JoeCondron
JoeCondron

Reputation: 8906

The problem is in os.path.dirname(df['Filename']): you are passing a Series here where it expects a str. What you can do is filenames = df['Filename'].str.split('/').str[-1] to get the filename without the directory and then dst + '/' + filenames to get the new paths. Better to define dst = '"/home/mycomp/Images/'

Upvotes: 1

Related Questions