Reputation: 1
I have a csv file that carries outputs of some processes over video frames. In the file, each line is either fire
or none
. Each line has startTime
and endTime
. Now I need to cluster and print only one instance out of continuous fires with their start and end time. The point is that a few none
in the middle can also be tolerated if their time is within 1 second. So to be clear, the whole point is to cluster detections of closer frames together...somehow smooth out the results. Instead of multiple 31-32, 32-33, ...
, have a single line with 31-35
seconds.
How to do that?
For instance, the whole following continuous items are considered a single one since the none
gaps is within 1s. So we would have something like 1,file1,name1,30.6,32.2,fire,0.83
with that score being the mean of all fire lines.
frame_num,uniqueId,title,startTime,endTime,startTime_fmt,object,score
...
10,file1,name1,30.6,30.64,0:00:30,fire,0.914617
11,file1,name1,30.72,30.76,0:00:30,none,0.68788
12,file1,name1,30.84,30.88,0:00:30,fire,0.993345
13,file1,name1,30.96,31,0:00:30,fire,0.991015
14,file1,name1,31.08,31.12,0:00:31,fire,0.983197
15,file1,name1,31.2,31.24,0:00:31,fire,0.979572
16,file1,name1,31.32,31.36,0:00:31,fire,0.985898
17,file1,name1,31.44,31.48,0:00:31,none,0.961606
18,file1,name1,31.56,31.6,0:00:31,none,0.685139
19,file1,name1,31.68,31.72,0:00:31,none,0.458374
20,file1,name1,31.8,31.84,0:00:31,none,0.413711
21,file1,name1,31.92,31.96,0:00:31,none,0.496828
22,file1,name1,32.04,32.08,0:00:32,fire,0.412836
23,file1,name1,32.16,32.2,0:00:32,fire,0.383344
This is my attempts so far:
with open(filename) as fin:
lastWasFire=False
for line in fin:
if "fire" in line:
if lastWasFire==False and line !="" and line.split(",")[5] != lastline.split(",")[5]:
fout.write(line)
else:
lastWasFire=False
lastline=line
Upvotes: 1
Views: 162
Reputation: 11613
This is close to what you are looking for and may be an acceptable alternative.
If your sample rate is quite stable (looks to be about 0.12s or 50 Hz) then you can find the equivalent number of samples you can tolerate to be 'none'
. Let's say that's 8.
This code will read in the data and fill the 'none' values with up to 8 of the last valid value.
import numpy as np
import pandas as pd
def groups_of_true_values(x):
"""Returns array of integers where each True value in x
is replaced by the count of the group of consecutive
True values that it was found in.
"""
return (np.diff(np.concatenate(([0], np.array(x, dtype=int)))) == 1).cumsum()*x
df = pd.read_csv('test.csv', index_col=0)
# Forward-fill the 'none' values to a limit
df['filled'] = df['object'].replace('none', None).fillna(method='ffill', limit=8)
# Find the groups of consecutive fire values
df['group'] = groups_of_true_values(df['filled'] == 'fire')
# Produce sum of scores by group
group_scores = df[['group', 'score']].groupby('group').sum()
print(group_scores)
# Find firing start and stop times
df['start'] = ((df['filled'] == 'fire') & (df['filled'].shift(1) == 'none'))
df['stop'] = ((df['filled'] == 'none') & (df['filled'].shift(1) == 'fire'))
start_times = df.loc[df['start'], 'startTime'].to_list()
stop_times = df.loc[df['stop'], 'startTime'].to_list()
print(start_times, stop_times)
Output:
score
group
1 10.347362
[] []
Hopefully, the output would be more interesting if there were longer sequences of no firing...
Upvotes: 1
Reputation: 11234
I assume you don't want to use external libraries for data processing like numpy
or pandas
. The following code should be quite similar to your attempt:
threshold = 1.0
# We will chain a "none" object at the end which triggers the threshold to make sure no "fire" objects are left unprinted
from itertools import chain
trigger = (",,,0,{},,none,".format(threshold + 1),)
# Keys for columns of input data
keys = (
"frame_num",
"uniqueId",
"title",
"startTime",
"endTime",
"startTime_fmt",
"object",
"score",
)
# Store last "fire" or "none" objects
last = {
"fire": [],
"none": [],
}
with open(filename) as f:
# Skip first line of input file
next(f)
for line in chain(f, trigger):
line = dict(zip(keys, line.split(",")))
last[line["object"]].append(line)
# Check threshold for "none" objects if there are previous unprinted "fire" objects
if line["object"] == "none" and last["fire"]:
if float(last["none"][-1]["endTime"]) - float(last["none"][0]["startTime"]) > threshold:
print("{},{},{},{},{},{},{},{}".format(
last["fire"][0]["frame_num"],
last["fire"][0]["uniqueId"],
last["fire"][0]["title"],
last["fire"][0]["startTime"],
last["fire"][-1]["endTime"],
last["fire"][0]["startTime_fmt"],
last["fire"][0]["object"],
sum([float(x["score"]) for x in last["fire"]]) / len(last["fire"]),
))
last["fire"] = []
# Previous "none" objects don't matter anymore as soon as a "fire" object is being encountered
if line["object"] == "fire":
last["none"] = []
The input file is being processed line by line and "fire"
objects are being accumulated in last["fire"]
. They will be merged and printed if either
the "none"
objects in last["none"]
reach the threshold defined in threshold
or when the end of the input file is reached due to the manually chained trigger
object, which is a "none"
object of length threshold + 1
, therefore triggering the threshold and subsequent merge and print.
You could replace print
with a call to write into an output file, of course.
Upvotes: 1
Reputation: 1234
My approach, using pandas
and groupby
:
fire
or none
) into a spellfire
or none
) into a superspell, and calculate the corresponding scoreI assume the data is sorted by time (otherwise we need to add a sort after reading the data). The trick to combining continuous lines of the same object into spells/superspells is: first, identify where the new spell/superspell starts (i.e. when the object type changes), and second, assign a unique id to each spell (= the number of new spell before it)
import pandas as pd
# preparing the test data
data = '''frame_num,uniqueId,title,startTime,endTime,startTime_fmt,object,score
10,file1,name1,30.6,30.64,0:00:30,fire,0.914617
11,file1,name1,30.72,30.76,0:00:30,none,0.68788
12,file1,name1,30.84,30.88,0:00:30,fire,0.993345
13,file1,name1,30.96,31,0:00:30,fire,0.991015
14,file1,name1,31.08,31.12,0:00:31,fire,0.983197
15,file1,name1,31.2,31.24,0:00:31,fire,0.979572
16,file1,name1,31.32,31.36,0:00:31,fire,0.985898
17,file1,name1,31.44,31.48,0:00:31,none,0.961606
18,file1,name1,31.56,31.6,0:00:31,none,0.685139
19,file1,name1,31.68,31.72,0:00:31,none,0.458374
20,file1,name1,31.8,31.84,0:00:31,none,0.413711
21,file1,name1,31.92,31.96,0:00:31,none,0.496828
22,file1,name1,32.04,32.08,0:00:32,fire,0.412836
23,file1,name1,32.16,32.2,0:00:32,fire,0.383344'''
with open("a.txt", 'w') as f:
print(data, file=f)
df1 = pd.read_csv("a.txt")
# mark new spell (the start of a series of continuous lines of the same object)
# new spell if the current object is different from the previous object
df1['newspell'] = df1.object != df1.object.shift(1)
# give each spell a unique spell number (equal to the total number of new spell before it)
df1['spellnum'] = df1.newspell.cumsum()
# group lines from the same spell together
spells = df1.groupby(by=["uniqueId", "title", "spellnum", "object"]).agg(
first_frame = ('frame_num', 'min'),
last_frame = ('frame_num', 'max'),
startTime = ('startTime', 'min'),
endTime = ('endTime', 'max'),
totalScore = ('score', 'sum'),
cnt = ('score', 'count')).reset_index()
# remove none-fire spells with duration less than 1
spells = spells[(spells.object == 'fire') | (spells.endTime > spells.startTime + 1)]
# Now group conitnous fire spells into superspells
# mark new superspell
spells['newsuperspell'] = spells.object != spells.object.shift(1)
# give each superspell a unique number
spells['superspellnum'] = spells.newsuperspell.cumsum()
superspells = spells.groupby(by=["uniqueId", "title", "superspellnum", "object"]).agg(
first_frame = ('first_frame', 'min'),
last_frame = ('last_frame', 'max'),
startTime = ('startTime', 'min'),
endTime = ('endTime', 'max'),
totalScore = ('totalScore', 'sum'),
cnt = ('cnt', 'sum')).reset_index()
superspells['score'] = superspells.totalScore/superspells.cnt
superspells.drop(columns=['totalScore', 'cnt'], inplace=True)
print(superspells.to_csv(index=False))
# output
#uniqueId,title,superspellnum,object,first_frame,last_frame,startTime,endTime,score
#file1,name1,1,fire,10,23,30.6,32.2,0.8304779999999999
Upvotes: 0