Reputation: 121
I have a csv that looks like this:
'-' 'd' '5'
'-' 'd' '9'
'-' 'v' '15'
'-' 's' '8'
'-' 's' '10'
'-' 'q' '3'
I would like to split the data frame every time the number on the last column decreases and save into new file
The output would look like this:
File1:
'-' 'd' '5'
'-' 'd' '9'
'-' 'v' '15'
File 2
'-' 's' '8'
'-' 's' '10'
File 3
'-' 'q' '3'
Upvotes: 0
Views: 1238
Reputation: 195408
df = pd.read_csv(
"your_file.txt",
quotechar="'",
quoting=1,
sep=r"\s+",
header=None,
)
for _, g in df.groupby(df[2].diff().le(0).cumsum()):
print(g.to_csv(index=None, quotechar="'", quoting=1, sep=" ", header=None))
Note: To write to files, add filename to .to_csv()
. Without the filename, it writes groups to screen.
Prints:
'-' 'd' '5'
'-' 'd' '9'
'-' 'v' '15'
'-' 's' '8'
'-' 's' '10'
'-' 'q' '3'
Upvotes: 0
Reputation: 7045
Using Series.shift
you can compare rows that are next to one another.
from io import StringIO
import pandas as pd
s = """'-' 'd' '5'
'-' 'd' '9'
'-' 'v' '15'
'-' 's' '8'
'-' 's' '10'
'-' 'q' '3'""".replace("'", "")
df = pd.read_csv(StringIO(s), sep="\s\s+", engine="python", names=["a", "b", "c"])
# preserve the original column names
cols = df.columns
# create a grouping column
df["d"] = (df["c"].shift(fill_value=0) > df["c"]).cumsum()
# output each group
for name, group in df.groupby("d"):
group[cols].to_csv(f"output_{name}.csv", index=False)
Which produces these groups:
a b c
0 - d 5
1 - d 9
2 - v 15
-----------
a b c
3 - s 8
4 - s 10
-----------
a b c
5 - q 3
-----------
Upvotes: 0
Reputation: 3194
I am going to assume that your CSV files look like typical CSV files for simplicity:
-,d,5
-,d,9
...
I am also assuming the numbers in the last column are always positive integers.
prev = 0
accumulatedLines = []
decreasedCount = 0
with open("my_file.txt", "r") as fin:
for line in fin:
values = line.split(",")
if int(values[2]) < prev:
with open("File{}.txt".format(decreasedCount + 1), "w") as fout:
fout.writelines(accumulatedLines)
decreasedCount += 1
accumulatedLines = []
accumulatedLines.append(line)
prev = int(values[2])
Essentially, we iterate over each line in the input file, splitting it on the comma delimiter, and keep track of the value of the last column on the previous line. We also accumulate the lines read up to the current line. If the current line's value in the last column is strictly lesser than that of the previous line, we write the accumulated lines to a new file (named after the number of times we have encountered a decreasing value so far). We then clear the accumulator (and increase the count).
Upvotes: 1