Reputation: 529
I have 50k CSV String (it's one single string with 50k values), format 23445, 23446, 24567, ..., etc. I want to create a wrapper script which breaks it into batches of 500 and pass it to script which accepts it as input.
input.csv (50k comma separated values)
script(batches of 500), throttles for 60 sec and takes another 500 data.
#!/bin/bash
input.csv | sed -n 1'p' | tr ',' '\n' | while read word; do
script_accpts_batch_of_500=$word
done
Upvotes: 0
Views: 90
Reputation: 2374
Not knowing much about the setting you'll launch this sort of thing, here's a more self-contained script. The shell function that uses awk
(splitcsv) is one way to split a very long line in CSV format into somewhat smaller lines in CSV format, surrounded by some functions to generate test input and simulate processing.
This use of awk
leaves the record-separator (RS) value alone and sets FS instead via awk's -F
option. "Long" CSV input lines are therefore all processed if splitcsv
is presented with many of them, with as many 500-field lines emitted as possible before the current long line runs out, and then a short line - less than 500 fields - emitted before processing the next long line.
But you only asked for one long line to be processed, so I'm stopping here.
#!/usr/bin/env bash
stepdown_csv() {
local n=500
[[ $# -eq 1 ]] && n="$1"
generate50000 |
splitcsv "$n" |
while IFS= read -r line; do
process_csv_line "$line"
done
}
process_csv_line() {
local unsep=$(sed 's/,/ /g' <<< "$1")
if [[ "$unsep" != '' ]]; then
set $unsep
echo "Got a CSV line with $# fields"
# sleep 60
fi
}
splitcsv() {
awk -F , -v flds="$1" '{
for (n=1; n<=NF; n++) {
printf "%s%s", $n, n % flds == 0 || n == NF ? "\n" : ","
}
}'
}
generate50000() {
for n in {1..50000}; do
echo -n $RANDOM
if [[ n -lt 50000 ]]; then
echo -n ,
else
echo
fi
done
}
stepdown_csv "$@"
Upvotes: 1
Reputation: 67467
another awk
solution can be
$ awk -v RS=, '{ORS=NR%500?RS:"\n"}1' file
Upvotes: 1
Reputation: 19982
You can combine different commands with
tr ',' '\n' < input.csv | paste -d, $(yes -- "- " | head -500)
You can also use one command:
awk 'BEGIN {RS=","} {if (NR%500==0) print $0 ; else printf $0 RS; }' input.csv
Upvotes: 1