Reputation: 31895
Edit: Hello all, thanks for your reply. My question is not how to solve the sample.csv I provided here, the situation is I have more than 100 similar files, and I hope I can solve them all fast and efficiently, I solved the problem by python, but I prefere sed, because I know that sed can modify the file directly. i dont want to run the similar commands hundreds times...
I have files generated daily, around 4 month, for each file it contains 9 columns, and now I want to remove the last two columns from all those files.
I was planning to use sed
to delete the last 2 columns with -i
, my purpose is that i can modify all files directly and dont need to write into new files. Unfortunately, I cannot find a way to do so, then I wrote my python script to finish all work.Here's my code:
def remove_last_two_columns(input_dir, output_dir, file_name):
writer = open(output_dir + file_name, "w")
with open(input_dir + file_name, "r") as inputs:
for line in inputs:
parts = line.strip().split(",")
outline = ""
for index, part in enumerate(parts):
if index < 7:
outline += "," + part
writer.write(outline[1:] + "\n")
writer.close()
remove_last_two_columns("/home/haifzhan/input/", "/home/haifzhan/output/", "sample.csv")
input:
C1,C2,2014-06-30 13:11:46,2014-07-01 00:19:12,43,N,N,N,N
C1,C2,2014-06-30 13:37:40,N,N,N,N,2014-07-01 00:37:22,N
C1,C2,2014-06-30 15:35:40,2014-07-01 00:23:14,36,N,N,N,N
C1,C2,2014-06-30 16:54:07,2014-07-01 00:08:38,35,N,N,N,N
C1,C2,2014-06-30 17:13:33,N,N,N,N,2014-07-01 00:25:55,N
C1,C2,2014-06-30 17:23:05,N,N,2014-07-01 00:26:03,13,N,N
C1,C2,2014-06-30 17:49:59,2014-07-01 02:46:20,11,N,N,N,N
C1,C2,2014-06-30 18:16:51,2014-07-01 06:15:25,20,N,N,N,N
C1,C2,2014-06-30 18:18:07,N,N,2014-07-01 00:02:22,24,N,N
C1,C2,2014-06-30 18:41:27,N,N,N,N,2014-07-01 00:52:22,N
my output:
C1,C2,2014-06-30 13:11:46,2014-07-01 00:19:12,43,N,N
C1,C2,2014-06-30 13:37:40,N,N,N,N
C1,C2,2014-06-30 15:35:40,2014-07-01 00:23:14,36,N,N
C1,C2,2014-06-30 16:54:07,2014-07-01 00:08:38,35,N,N
C1,C2,2014-06-30 17:13:33,N,N,N,N
C1,C2,2014-06-30 17:23:05,N,N,2014-07-01 00:26:03,13
C1,C2,2014-06-30 17:49:59,2014-07-01 02:46:20,11,N,N
C1,C2,2014-06-30 18:16:51,2014-07-01 06:15:25,20,N,N
C1,C2,2014-06-30 18:18:07,N,N,2014-07-01 00:02:22,24
C1,C2,2014-06-30 18:41:27,N,N,N,N
Can anyone provide a sed/awk way to achieve this? I want to use sed/awk in the future work. Thanks in advance.
Upvotes: 1
Views: 1806
Reputation: 4683
This statement removes the last two columns, where sample.csv
is the name of the input file.
sed s/,[^,]*,[^,]*$//g sample.csv
My result was:
C1,C2,2014-06-30 13:11:46,2014-07-01 00:19:12,43,N,N
C1,C2,2014-06-30 13:37:40,N,N,N,N
C1,C2,2014-06-30 15:35:40,2014-07-01 00:23:14,36,N,N
C1,C2,2014-06-30 16:54:07,2014-07-01 00:08:38,35,N,N
C1,C2,2014-06-30 17:13:33,N,N,N,N
C1,C2,2014-06-30 17:23:05,N,N,2014-07-01 00:26:03,13
C1,C2,2014-06-30 17:49:59,2014-07-01 02:46:20,11,N,N
C1,C2,2014-06-30 18:16:51,2014-07-01 06:15:25,20,N,N
C1,C2,2014-06-30 18:18:07,N,N,2014-07-01 00:02:22,24
C1,C2,2014-06-30 18:41:27,N,N,N,N
In your example, you deleted the last 3 columns, you could do that by modifying the original statement to be the following:
sed s/,[^,]*,[^,]*,[^,]*$//g sample.csv
Upvotes: 2
Reputation: 1287
cut is definitely the easiest tool to achieve this:
cat input | cut -d, -f8,9 --complement
Note that the osx version of cut is outdated, so it's best to fetch the latest:
brew install coreutils
Upvotes: 2