Haifeng Zhang
Haifeng Zhang

Reputation: 31895

Can anyone provide a sed or awk way to delete last two columns for the csv file?

Edit: Hello all, thanks for your reply. My question is not how to solve the sample.csv I provided here, the situation is I have more than 100 similar files, and I hope I can solve them all fast and efficiently, I solved the problem by python, but I prefere sed, because I know that sed can modify the file directly. i dont want to run the similar commands hundreds times...

I have files generated daily, around 4 month, for each file it contains 9 columns, and now I want to remove the last two columns from all those files.

I was planning to use sed to delete the last 2 columns with -i, my purpose is that i can modify all files directly and dont need to write into new files. Unfortunately, I cannot find a way to do so, then I wrote my python script to finish all work.Here's my code:

    def remove_last_two_columns(input_dir, output_dir, file_name):
    writer = open(output_dir + file_name, "w")
    with open(input_dir + file_name, "r") as inputs:
        for line in inputs:
            parts = line.strip().split(",")
            outline = ""
            for index, part in enumerate(parts):
                if index < 7:
                    outline += "," + part

            writer.write(outline[1:] + "\n")
    writer.close()

remove_last_two_columns("/home/haifzhan/input/", "/home/haifzhan/output/", "sample.csv") 

input:

C1,C2,2014-06-30 13:11:46,2014-07-01 00:19:12,43,N,N,N,N
C1,C2,2014-06-30 13:37:40,N,N,N,N,2014-07-01 00:37:22,N
C1,C2,2014-06-30 15:35:40,2014-07-01 00:23:14,36,N,N,N,N
C1,C2,2014-06-30 16:54:07,2014-07-01 00:08:38,35,N,N,N,N
C1,C2,2014-06-30 17:13:33,N,N,N,N,2014-07-01 00:25:55,N
C1,C2,2014-06-30 17:23:05,N,N,2014-07-01 00:26:03,13,N,N
C1,C2,2014-06-30 17:49:59,2014-07-01 02:46:20,11,N,N,N,N
C1,C2,2014-06-30 18:16:51,2014-07-01 06:15:25,20,N,N,N,N
C1,C2,2014-06-30 18:18:07,N,N,2014-07-01 00:02:22,24,N,N
C1,C2,2014-06-30 18:41:27,N,N,N,N,2014-07-01 00:52:22,N



my output:
C1,C2,2014-06-30 13:11:46,2014-07-01 00:19:12,43,N,N
C1,C2,2014-06-30 13:37:40,N,N,N,N
C1,C2,2014-06-30 15:35:40,2014-07-01 00:23:14,36,N,N
C1,C2,2014-06-30 16:54:07,2014-07-01 00:08:38,35,N,N
C1,C2,2014-06-30 17:13:33,N,N,N,N
C1,C2,2014-06-30 17:23:05,N,N,2014-07-01 00:26:03,13
C1,C2,2014-06-30 17:49:59,2014-07-01 02:46:20,11,N,N
C1,C2,2014-06-30 18:16:51,2014-07-01 06:15:25,20,N,N
C1,C2,2014-06-30 18:18:07,N,N,2014-07-01 00:02:22,24
C1,C2,2014-06-30 18:41:27,N,N,N,N

Can anyone provide a sed/awk way to achieve this? I want to use sed/awk in the future work. Thanks in advance.

Upvotes: 1

Views: 1806

Answers (3)

user3442743
user3442743

Reputation:

Awk solution

awk 'BEGIN{FS=OFS=","}NF=(NF-2)' file

Upvotes: 5

Adam
Adam

Reputation: 4683

This statement removes the last two columns, where sample.csv is the name of the input file.

sed s/,[^,]*,[^,]*$//g sample.csv

My result was:

C1,C2,2014-06-30 13:11:46,2014-07-01 00:19:12,43,N,N
C1,C2,2014-06-30 13:37:40,N,N,N,N
C1,C2,2014-06-30 15:35:40,2014-07-01 00:23:14,36,N,N
C1,C2,2014-06-30 16:54:07,2014-07-01 00:08:38,35,N,N
C1,C2,2014-06-30 17:13:33,N,N,N,N
C1,C2,2014-06-30 17:23:05,N,N,2014-07-01 00:26:03,13
C1,C2,2014-06-30 17:49:59,2014-07-01 02:46:20,11,N,N
C1,C2,2014-06-30 18:16:51,2014-07-01 06:15:25,20,N,N
C1,C2,2014-06-30 18:18:07,N,N,2014-07-01 00:02:22,24
C1,C2,2014-06-30 18:41:27,N,N,N,N

In your example, you deleted the last 3 columns, you could do that by modifying the original statement to be the following:

sed s/,[^,]*,[^,]*,[^,]*$//g sample.csv

Upvotes: 2

fivetentaylor
fivetentaylor

Reputation: 1287

cut is definitely the easiest tool to achieve this:

cat input | cut -d, -f8,9 --complement

Note that the osx version of cut is outdated, so it's best to fetch the latest:

brew install coreutils

Upvotes: 2

Related Questions