Reputation: 89
I can read one ann file into pandas dataframe as follows:
df = pd.read_csv('something/something.ann', sep='^([^\s]*)\s', engine='python', header=None).drop(0, axis=1)
df.head()
But I don't know how to read multiple ann files into one pandas dataframe. I tried to use concat
, but the result is not what I expected.
How can I read many ann files into one pandas dataframe?
Upvotes: 0
Views: 743
Reputation: 2670
It sounds like you need to use glob
to pull in all the .ann
files from a folder and add them to a list of dataframes. After that you probably want to join/merge/concat etc. as required.
I don't know your exact requirements but the code below should get you close. As it stands at the moment the script assumes, from where you are running the Python script, you have a subfolder called files
and in that you want to pull in all the .ann
files (it will not look at anything else). Obviously review and change as required as it's commented per line.
import pandas as pd
import glob
path = r'./files' # use your path
all_files = glob.glob(path + "/*.ann")
# create empty list to hold dataframes from files found
dfs = []
# for each file in the path above ending .ann
for file in all_files:
#open the file
df = pd.read_csv(file, sep='^([^\s]*)\s', engine='python', header=None).drop(0, axis=1)
#add this new (temp during the looping) frame to the end of the list
dfs.append(df)
#at this point you have a list of frames with each list item as one .ann file. Like [annFile1, annFile2, etc.] - just not those names.
#handle a list that is empty
if len(dfs) == 0:
print('No files found.')
#create a dummy frame
df = pd.DataFrame()
#or have only one item/frame and get it out
elif len(dfs) == 1:
df = dfs[0]
#or concatenate more than one frame together
else: #modify this join as required.
df = pd.concat(dfs, ignore_index=True)
df = df.reset_index(drop=True)
#check what you've got
print(df.head())
Upvotes: 1