mattdns
mattdns

Reputation: 894

Pandas - apply function over one column and return a new column

I have a function findFullPath that takes a string of a file and finds the full path of that file using a list of the full paths. For example;

    >>> i = 4000
    >>> serisuid = candidates.iloc[i].seriesuid
    >>> fullPath = findFullPath(serisuid,fullPaths)
    >>> print(serisuid)
    >>> print(fullPath)

    1.3.6.1.4.1.14519.5.2.1.6279.6001.100684836163890911914061745866
    /home/msmith/luna16/subset1/1.3.6.1.4.1.14519.5.2.1.6279.6001.100684836163890911914061745866.raw

I am trying to apply this function to the full column candidates["seriesuid"] and to return a new column with the full paths using something like below but so far it is unsuccessful;

>>> candidates["seriesuidFullPaths"] = candidates[["seriesuid"]].apply(findFullPath,args=(fullPaths,),axis=1)

[EDIT]

Sorry to be a bit ambiguous. So my function is

def findFullPath(seriesuid,fullPaths):
    fullPath = [s.replace(".mhd",".raw") for s in fullPaths if serisuid in s][0]
    return fullPath

which works perfectly in the case by case code I gave at the top but yields the incorrect full file paths when I apply it over the series. Moreover I get a copy error;

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

However I thought that I was editing the actual data frame so I am a bit confused.

[EXAMPLE]

>>> candidates.head()

                                                          seriesuid  coordX  \
0  1.3.6.1.4.1.14519.5.2.1.6279.6001.100225287222365663678666836860  -56.08   
1  1.3.6.1.4.1.14519.5.2.1.6279.6001.100225287222365663678666836860   53.21   
2  1.3.6.1.4.1.14519.5.2.1.6279.6001.100225287222365663678666836860  103.66   
3  1.3.6.1.4.1.14519.5.2.1.6279.6001.100225287222365663678666836860  -33.66   
4  1.3.6.1.4.1.14519.5.2.1.6279.6001.100225287222365663678666836860  -32.25   

   coordY  coordZ  class  
0  -67.85 -311.92      0  
1 -244.41 -245.17      0  
2 -121.80 -286.62      0  
3  -72.75 -308.41      0  
4  -85.36 -362.51      0  

I just updated fullPaths to include just the .raw files.

>>> fullPaths = [path for path in fullPaths if ".raw" in path]
>>> fullPaths[:5] 
['/home/msmith/luna16/subset4/1.3.6.1.4.1.14519.5.2.1.6279.6001.142154819868944114554521645782.raw',
     '/home/msmith/luna16/subset4/1.3.6.1.4.1.14519.5.2.1.6279.6001.211071908915618528829547301883.raw',
     '/home/msmith/luna16/subset4/1.3.6.1.4.1.14519.5.2.1.6279.6001.390009458146468860187238398197.raw',
     '/home/msmith/luna16/subset4/1.3.6.1.4.1.14519.5.2.1.6279.6001.463214953282361219537913355115.raw',
     '/home/msmith/luna16/subset4/1.3.6.1.4.1.14519.5.2.1.6279.6001.252358625003143649770119512644.raw']

I want to replace each seriesuid in candidates with the relevant .raw file path. Hope this clears it up.

Upvotes: 1

Views: 435

Answers (2)

jezrael
jezrael

Reputation: 862406

You can try create new DataFrame p from list fullPath, extract seriesuid to new column seriesuid and then merge it with DataFrame candidates by columns seriesuid.

I changed first and last item in list fullPath for testing:

print candidates
                                           seriesuid  coordX  coordY  coordZ  \
0  9.3.6.1.4.1.14519.5.2.1.6279.6001.100225287222...  -56.08  -67.85 -311.92   
1  1.3.6.1.4.1.14519.5.2.1.6279.6001.100225287222...   53.21 -244.41 -245.17   
2  1.3.6.1.4.1.14519.5.2.1.6279.6001.100225287222...  103.66 -121.80 -286.62   
3  1.3.6.1.4.1.14519.5.2.1.6279.6001.100225287222...  -33.66  -72.75 -308.41   
4  1.3.6.1.4.1.14519.5.2.1.6279.6001.100225287222...  -32.25  -85.36 -362.51   

   class  
0      0  
1      0  
2      0  
3      0  
4      0  
fullPath = ['/home/msmith/luna16/subset4/9.3.6.1.4.1.14519.5.2.1.6279.6001.100225287222365663678666836860.raw',
     '/home/msmith/luna16/subset4/1.3.6.1.4.1.14519.5.2.1.6279.6001.211071908915618528829547301883.raw',
     '/home/msmith/luna16/subset4/1.3.6.1.4.1.14519.5.2.1.6279.6001.390009458146468860187238398197.raw',
     '/home/msmith/luna16/subset4/1.3.6.1.4.1.14519.5.2.1.6279.6001.463214953282361219537913355115.raw',
     '/home/msmith/luna16/subset4/9.3.6.1.4.1.14519.5.2.1.6279.6001.100225287222365663678666836860.raw']

p = pd.DataFrame(fullPath, columns=['paths'])
p["paths"] = p["paths"].str.replace(".raw","")
p['seriesuid'] = p['paths'].str.split('/').str[5]
print p
                                               paths  \
0  /home/msmith/luna16/subset4/9.3.6.1.4.1.14519....   
1  /home/msmith/luna16/subset4/1.3.6.1.4.1.14519....   
2  /home/msmith/luna16/subset4/1.3.6.1.4.1.14519....   
3  /home/msmith/luna16/subset4/1.3.6.1.4.1.14519....   
4  /home/msmith/luna16/subset4/9.3.6.1.4.1.14519....   

                                           seriesuid  
0  9.3.6.1.4.1.14519.5.2.1.6279.6001.100225287222...  
1  1.3.6.1.4.1.14519.5.2.1.6279.6001.211071908915...  
2  1.3.6.1.4.1.14519.5.2.1.6279.6001.390009458146...  
3  1.3.6.1.4.1.14519.5.2.1.6279.6001.463214953282...  
4  9.3.6.1.4.1.14519.5.2.1.6279.6001.100225287222...  
print pd.merge(candidates, p, on=['seriesuid'])    
                                           seriesuid  coordX  coordY  coordZ  \
0  9.3.6.1.4.1.14519.5.2.1.6279.6001.100225287222...  -56.08  -67.85 -311.92   
1  9.3.6.1.4.1.14519.5.2.1.6279.6001.100225287222...  -56.08  -67.85 -311.92   

   class                                              paths  
0      0  /home/msmith/luna16/subset4/9.3.6.1.4.1.14519....  
1      0  /home/msmith/luna16/subset4/9.3.6.1.4.1.14519....  

If directories have different lengths in list fullPath, you can use:

fullPath = ['/home/msmith/luna16/subset4/9.3.6.1.4.1.14519.5.2.1.6279.6001.100225287222365663678666836860.raw',
     '/home/msmith/luna16/subset4/1.3.6.1.4.1.14519.5.2.1.6279.6001.211071908915618528829547301883.raw',
     '/msmith/luna16/subset4/1.3.6.1.4.1.14519.5.2.1.6279.6001.390009458146468860187238398197.raw',
     '/home/msmith/luna16/subset4/1.3.6.1.4.1.14519.5.2.1.6279.6001.463214953282361219537913355115.raw',
     '/home/msmith/luna16/subset4/9.3.6.1.4.1.14519.5.2.1.6279.6001.100225287222365663678666836860.raw']

p = pd.DataFrame(fullPath, columns=['paths'])
#replace .raw to empty string
p["paths"] = p["paths"].str.replace(".raw","")
#find last string splitted by / and get it to column seriesuid
p[['tmp','seriesuid']] = p['paths'].str.rsplit('/', expand=True, n=1)
#drop unnecessary column tmp
p = p.drop(['tmp'], axis=1)
print p
                                               paths  \
0  /home/msmith/luna16/subset4/9.3.6.1.4.1.14519....   
1  /home/msmith/luna16/subset4/1.3.6.1.4.1.14519....   
2  /msmith/luna16/subset4/1.3.6.1.4.1.14519.5.2.1...   
3  /home/msmith/luna16/subset4/1.3.6.1.4.1.14519....   
4  /home/msmith/luna16/subset4/9.3.6.1.4.1.14519....   

                                           seriesuid  
0  9.3.6.1.4.1.14519.5.2.1.6279.6001.100225287222...  
1  1.3.6.1.4.1.14519.5.2.1.6279.6001.211071908915...  
2  1.3.6.1.4.1.14519.5.2.1.6279.6001.390009458146...  
3  1.3.6.1.4.1.14519.5.2.1.6279.6001.463214953282...  
4  9.3.6.1.4.1.14519.5.2.1.6279.6001.100225287222...  
print pd.merge(candidates, p, on=['seriesuid'])    
                                           seriesuid  coordX  coordY  coordZ  \
0  9.3.6.1.4.1.14519.5.2.1.6279.6001.100225287222...  -56.08  -67.85 -311.92   
1  9.3.6.1.4.1.14519.5.2.1.6279.6001.100225287222...  -56.08  -67.85 -311.92   

   class                                              paths  
0      0  /home/msmith/luna16/subset4/9.3.6.1.4.1.14519....  
1      0  /home/msmith/luna16/subset4/9.3.6.1.4.1.14519....  

Upvotes: 1

Mini Fridge
Mini Fridge

Reputation: 939

This error:

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

Usually this happens when you apply the function on a slice of a DataFrame.

One way you could get rid of this error would be:

candidates = df.loc[<Your condition>].copy()

where df is the source DataFrame that you initially created.

Then this if it is right it should work:

candidates["seriesuidFullPaths"] = candidates[["seriesuid"]].apply(findFullPath,args=(fullPaths,),axis=1)

Upvotes: 1

Related Questions