robahall
robahall

Reputation: 25

Extract string from cell in Pandas dataframe

I have a data frame, df:

         Filename         Weight
0  '\file path\file.txt'    NaN
1  '\file path\file.txt'    NaN
2  '\file path\file.txt'    NaN

and I have an function where I input the file name and it extracts a float value for me from the file. What I want is to call the file path from Filename from each row in df into my function and then output the data into the Weight column. My current code is:

df['Weight'] = df['Weight'].apply(x_wgt_pct(df['filename'].to_string()), axis = 1)

My error is:

pandas\parser.pyx in pandas.parser.TextReader.__cinit__ (pandas\parser.c:3173)()

pandas\parser.pyx in pandas.parser.TextReader._setup_parser_source (pandas\parser.c:5912)()

IOError: File 0      file0.txt
1      file1.txt
2      file2.txt
3      file3.txt does not exist

Not sure whether this error is bc it is calling all the file paths simultaneously as a string or I did not input the file path correctly.

Upvotes: 2

Views: 14843

Answers (1)

Andy Hayden
Andy Hayden

Reputation: 375565

to_string creates a string from the column, which isn't what you want:

In [11]: df['Filename'].to_string()
Out[11]: "0  '\\file    path\\file.txt'\n1  '\\file    path\\file.txt'\n2  '\\file    path\\file.txt'"

Assuming that x_wgt_pct is the function that takes a filepath and returns a float... you can loop through the entries:

for i, f in enumerate(df["Filename"]):
    weight = x_wgt_pct(f)  # Note: you may have to slice off the 's i.e. f[1:-1]
    df.ix[i, "Weight"] = weight

Note: some further care has to be taken if you have duplicate rows indices.

Upvotes: 1

Related Questions