Reputation: 79
I want to find the median for each column, however it doesn't work like what I want.
1 2 3
3 2 1
2 1 5
I'm expecting for
2 2 3
for the result, however turns out it just give sum error and some "sum" of the column. Below is a snippet of the code for "median in column"
while read -r line; do
read -a array <<< "$line"
for i in "${!array[@]}"
do
column[${i}]=${array[$i]}
((length[${i}]++))
result=${column[*]} | sort -n
done < file
for i in ${!column[@]}
do
#some median calculation.....
Notes: I want to practice bash, that's why I hard-coded using bash. I really appreciate if someone could help me, especially in BASH. Thank you.
Upvotes: 1
Views: 2984
Reputation: 55489
Bash is really not suitable for low-level text processing like this: the read
command does a system call for each character that it reads, which means that it's slow, and it's a CPU hog. It's ok for processing interactive input, but using it for general text processing is madness. It would be much better to use awk (Python, Perl, etc) for this.
As an exercise in learning about Bash I guess it's ok, but please try to avoid using read
for bulk text processing in real programs. For further information, please see Why is using a shell loop to process text considered bad practice? on the Unix & Linux Stack Exchange site, especially the answer written by
Stéphane Chazelas (the discoverer of the Shellshock Bash bug).
Anyway, to get back to your question... :)
Most of your code is ok, but
result=${column[*]} | sort -n
doesn't do what you want it to.
Here's one way to get the column medians in pure Bash:
#!/usr/bin/env bash
# Find medians of columns of numeric data
# See http://stackoverflow.com/q/33095764/4014959
# Written by PM 2Ring 2015.10.13
fname=$1
echo "input data:"
cat "$fname"
echo
#Read rows, saving into columns
numrows=1
while read -r -a array; do
((numrows++))
for i in "${!array[@]}"; do
#Separate column items with a newline
column[i]+="${array[i]}"$'\n'
done
done < "$fname"
#Calculate line number of middle value; which must be 1-based to use as `head`
#argument, and must compensate for extra newline added by 'here' string, `<<<`
midrow=$((1+numrows/2))
echo "midrow: $midrow"
#Get median of each column
result=''
for i in "${!column[@]}"; do
median=$(sort -n <<<"${column[i]}" | head -n "$midrow" | tail -n 1)
result+="$median "
done
echo "result: $result"
output
input data:
1 2 3
3 2 1
2 1 5
midrow: 3
result: 2 2 3
Upvotes: 3