Reputation: 625
I use this to make a giant dataframe from many files in a directory:
path = r'C:\\Users\\me\\data\\'
all_files = glob.glob(os.path.join(path, "*"))
df_from_each_file = (pd.read_csv(f, sep='\t') for f in all_files)
concatdf = pd.concat(df_from_each_file, ignore_index=True)
The files in that path have names like
AAA.etc.etc.
AAA.etc.etc
BBB.etc.etc.
As I import each file, I want to add a column to the dataframe that has AAA
or BBB
next to all the rows imported from that file, like this:
col1 col2 col3
data1 data2 AAA
data3 data4 AAA
data1 data2 AAA
data3 data4 AAA
data1 data2 BBB
data3 data4 BBB
Upvotes: 0
Views: 1097
Reputation: 323226
You may check with keys
+ reset_index
key=[PureWindowsPath(i).name.split('.', 1)[0] for i in all_files]
concatdf = pd.concat(df_from_each_file, ignore_index=True,keys=key).reset_index(level=0)
Upvotes: 1
Reputation: 1704
I usually change the current working directory to the path:
import os
os.chdir(path)
You can assign col3
to be the part of the filename you wish by using assign
.
df_from_each_file = (pd.read_csv(f, sep='\t').assign(col3=f.split('.')[0]) for f in all_files)
So your code would look like:
os.chdir(path)
all_files = glob.glob('*')
df_from_each_file = (pd.read_csv(f).assign(col3=f.split('.')[0]) for f in all_files)
concatdf = pd.concat(df_from_each_file, ignore_index=True)
If you don't want to change the current working directory, then you could use os.path.basename(path)
to get the filenames in the path. so your code would look like:
all_files = glob.glob('*')
df_from_each_file = (pd.read_csv(f).assign(col3=os.path.basename(f).split('.')[0]) for f in all_files)
concatdf = pd.concat(df_from_each_file, ignore_index=True)
Upvotes: 0
Reputation: 14216
This is one way to do it:
from pathlib import PureWindowsPath
def fn_helper(fn):
df = pd.read_csv(fn, sep='\t')
p = PureWindowsPath(fn)
part = p.name.split('.')[0]
df['col3'] = part
return df
df_from_each_file = (fn_helper(f) for f in all_files)
...
Or as other people are showing with one-liners:
(pd.read_csv(f, sep='\t').assign(col3=PureWindowsPath(f).name.split('.')[0]) for f in all_files)
Upvotes: 2