Reputation: 1745
I am wondering if there is a more performant way to iterate through a pandas dataframe and concatenate values in different columns.
For example I have the below working:
import pandas as pd
from pathlib import Path
data = {'subdir': ['tom', 'phil', 'ava'],
'filename':['9.wav', '8.wav', '7.wav'],
'text':['Pizza','Strawberries and yogurt', 'potato']}
df = pd.DataFrame(data, columns = ['subdir', 'filename', 'text'])
df.head()
example_path = Path(r"C:\Hello\World")
for index, row in df.iterrows():
full_path = example_path.joinpath(row['subdir'], row['filename'])
print(full_path)
text = row['text']
print(text)
Output:
C:\Hello\World\tom\9.wav
Pizza
C:\Hello\World\phil\8.wav
Strawberries and yogurt
C:\Hello\World\ava\7.wav
potato
However, I have a large amount of rows and I would like to do this in the fastest way possible. What is the best way to do this? I am taking pieces of a path (subdirectory and the base file name) and concatenating them as I iterate through the dataframe.
I will also likely be grabbing data from other adjacent columns (like 'text' in the example) and storing them as I iterate over the dataframe, so I'd like to find a way to do this all in one go, as I will be taking these pieces to output a dictionary/dataframe object after I have gathered all of the data in list or series like structures.
Thank you.
Upvotes: 0
Views: 326
Reputation: 680
You can always make a path column in your df using .apply method:
import pandas as pd
import pathlib
data = {'subdir': ['tom', 'phil', 'ava'],
'filename':['9.wav', '8.wav', '7.wav'],
'text':['Pizza','Strawberries and yogurt', 'potato']}
df = pd.DataFrame(data, columns = ['subdir', 'filename', 'text'])
df["path"] = df[['subdir','filename']].apply(
lambda x:pathlib.Path(
r"C:\Hello\World\{}\{}".format(
x['subdir'],x['filename']
)
),
axis=1
)
print(df[['path','text']])
Out:
path text
0 C:\Hello\World\tom\9.wav Pizza
1 C:\Hello\World\phil\8.wav Strawberries and yogurt
2 C:\Hello\World\ava\7.wav potato
Upvotes: 1
Reputation: 150785
Since you are using Path
, you can just do:
example_path/df.filename
Output (my system is Linux):
0 C:\Hello\World/9.wav
1 C:\Hello\World/8.wav
2 C:\Hello\World/7.wav
Name: filename, dtype: object
Note usually, string operations are not vectorized. The above piece of code might very well be just a wrapper for a for
loop.
Upvotes: 1