lima0
lima0

Reputation: 121

Python - Panda Split a csv into multiple files at a condition

I have a csv that looks like this:

'-'  'd'  '5'
'-'  'd'  '9'
'-'  'v'  '15'
'-'  's'  '8'
'-'  's'  '10'
'-'  'q'  '3'

I would like to split the data frame every time the number on the last column decreases and save into new file The output would look like this:
File1:

'-'  'd'  '5'
'-'  'd'  '9'
'-'  'v'  '15'

File 2

'-'  's'  '8'
'-'  's'  '10'

File 3

'-'  'q'  '3'

Upvotes: 0

Views: 1238

Answers (3)

Andrej Kesely
Andrej Kesely

Reputation: 195408

  1. Read the CSV:
df = pd.read_csv(
    "your_file.txt",
    quotechar="'",
    quoting=1,
    sep=r"\s+",
    header=None,
)
  1. Print the groups:
for _, g in df.groupby(df[2].diff().le(0).cumsum()):
    print(g.to_csv(index=None, quotechar="'", quoting=1, sep=" ", header=None))

Note: To write to files, add filename to .to_csv(). Without the filename, it writes groups to screen.


Prints:

'-' 'd' '5'
'-' 'd' '9'
'-' 'v' '15'

'-' 's' '8'
'-' 's' '10'

'-' 'q' '3'

Upvotes: 0

Alex
Alex

Reputation: 7045

Using Series.shift you can compare rows that are next to one another.

from io import StringIO
import pandas as pd

s = """'-'  'd'  '5'
'-'  'd'  '9'
'-'  'v'  '15'
'-'  's'  '8'
'-'  's'  '10'
'-'  'q'  '3'""".replace("'", "")
df = pd.read_csv(StringIO(s), sep="\s\s+", engine="python", names=["a", "b", "c"])

# preserve the original column names
cols = df.columns
# create a grouping column
df["d"] = (df["c"].shift(fill_value=0) > df["c"]).cumsum()

# output each group
for name, group in df.groupby("d"):
    group[cols].to_csv(f"output_{name}.csv", index=False)

Which produces these groups:

   a  b   c
0  -  d   5
1  -  d   9
2  -  v  15
-----------
   a  b   c
3  -  s   8
4  -  s  10
-----------
   a  b  c
5  -  q  3
-----------

Upvotes: 0

zr0gravity7
zr0gravity7

Reputation: 3194

I am going to assume that your CSV files look like typical CSV files for simplicity:

-,d,5
-,d,9
...

I am also assuming the numbers in the last column are always positive integers.

prev = 0
accumulatedLines = []
decreasedCount = 0
with open("my_file.txt", "r") as fin:
    for line in fin:
        values = line.split(",")
        if int(values[2]) < prev:
            with open("File{}.txt".format(decreasedCount + 1), "w") as fout:
                fout.writelines(accumulatedLines)
            decreasedCount += 1
            accumulatedLines = []

        accumulatedLines.append(line)
        prev = int(values[2])

Essentially, we iterate over each line in the input file, splitting it on the comma delimiter, and keep track of the value of the last column on the previous line. We also accumulate the lines read up to the current line. If the current line's value in the last column is strictly lesser than that of the previous line, we write the accumulated lines to a new file (named after the number of times we have encountered a decreasing value so far). We then clear the accumulator (and increase the count).

Upvotes: 1

Related Questions