funkymunkypython
funkymunkypython

Reputation: 23

Copy and rename files using a pandas dataframe

I am relatively new user of python so struggling with the below. Essentially I am trying to move a bunch of files from one folder to other, and rename them using a pandas df I built - states_mapping_df. I tried to convert the df to string using states_mapping_df = states_mapping_df.astype("string") but that didnt help.

from shutil import copyfile
covid_src_dir = r"I:\COVID\COVID tracker\UserA\Hospitalizations"

covid_new_dir = r"I:\COVID\COVID tracker\Hospitalizations_images"

states_mapping_df = pd.DataFrame ({"Abbr" :['CA','FL','IL','NJ','NY','NC','OH','PA','TX','VA'],
                                   "State_Name" :['California','Florida','Illinois','New Jersey','New York','North Caroliina','Ohio','Pennsylvania','Texas','Virginia']})

for row in states_mapping_df['Abbr']:
    #oldname = states_mapping_df['Abbr']+'.png'
    #newname = states_mapping_df['State_Name']+'.png'
    oldpath_covid = covid_src_dir + "\\" + row +'.png'
    newpath_covid = covid_new_dir + "\\" + states_mapping_df['State_Name'].astype('string') +'.png'
    copyfile(oldpath_covid, newpath_covid)

I get the below error when I run it

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-260-40e52504bb33> in <module>
     19     oldpath_covid = covid_src_dir + "\\" + row +'.png'
     20     newpath_covid = covid_new_dir + "\\" + states_mapping_df['State_Name'].astype('string') +'.png'
---> 21     copyfile(oldpath_covid, newpath_covid)
     22     #shutil.copy(oldpath_covid, newpath_covid)

~\Anaconda3\lib\shutil.py in copyfile(src, dst, follow_symlinks)
    238     sys.audit("shutil.copyfile", src, dst)
    239 
--> 240     if _samefile(src, dst):
    241         raise SameFileError("{!r} and {!r} are the same file".format(src, dst))
    242 

~\Anaconda3\lib\shutil.py in _samefile(src, dst)
    215     if hasattr(os.path, 'samefile'):
    216         try:
--> 217             return os.path.samefile(src, dst)
    218         except OSError:
    219             return False

~\Anaconda3\lib\genericpath.py in samefile(f1, f2)
     99     """
    100     s1 = os.stat(f1)
--> 101     s2 = os.stat(f2)
    102     return samestat(s1, s2)
    103 

TypeError: stat: path should be string, bytes, os.PathLike or integer, not Series

Upvotes: 1

Views: 611

Answers (1)

Paul Wilson
Paul Wilson

Reputation: 560

I believe your problem is herestates_mapping_df['State_Name']

The error is telling you you're using a series. You are trying to rename a file as a whole column of values (series) from the DataFrame.

You need to filter the actual value you want.

Try this.

for row in states_mapping_df['Abbr']:
    #oldname = states_mapping_df['Abbr']+'.png'
    #newname = states_mapping_df['State_Name']+'.png'
    
    # filter row of df according to present row from abbr
    filt = (df['Abbr']==row)
    # use .loc to isolate the specific cell from the filter and the column name
    row_filtered = df.loc[filt, 'State_Name']
    # a list is returned where first value is the cell value
    state_name = row_filtered.values[0]
    oldpath_covid = covid_src_dir + "\\" + row +'.png'
    # renamed the initial series to the state name
    newpath_covid = covid_new_dir + "\\" + 
    state_name +'.png'
    copyfile(oldpath_covid, newpath_covid)

Edit, a bit more info:

.loc is a means of filtering a Pandas DataFrame. You pass your df.loc[a,b] with two parameters a,b where a = rows and b = columns. Generally, most will use this in the same way I did above where they first of all create a filter for use in a just like I did. (df['state'] == 'California') would return a list of boolean values (true/false) where only instances of California would return True. Then when you pass that through .loc[] along with your column name then you return the specific cell (or cells if passing through multiple column names for b). Then calling .values returns an array of said values.

Another method is .iloc[] which works the same way though i means integer. So if you wanted to return the 10th row and columns 5 through 8 you would use df.iloc[10,5:8]

Or if you wanted to return everything you could also do df.iloc[:,:] or if you wanted to return all columns where your row values equate to California, using the same filter expression as above, then you could use df.loc[filt, ::]

The colon expressions represent index slicing just like you do on a list.

More here:

loc https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html

iloc https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iloc.html

Indexing and slicing https://realpython.com/lessons/indexing-and-slicing/

Various other filtering methods including those mentioned https://towardsdatascience.com/7-different-ways-to-filter-pandas-dataframes-9e139888382a

Upvotes: 1

Related Questions