Reputation: 3
After analysis of brain scans I ended up with around 1000 .csv files, one for each scan. I've merged them into one in order (by subject ID and date). My problem is, that some subjects had two or more consecutive scans and some had only one. Database now looks like that:
ID, CC_area, CC_perimeter, CC_circularity
024_S_0985, 407.00, 192.15, 0.138530 //first scan of A
024_S_0985, 437.50, 204.80, 0.131074 //second scan of A
024_S_0985, 400.75, 198.80, 0.127420 //third scan of A
024_S_1063, 544.50, 214.34, 0.148939 //first and only scan of B
024_S_1171, 654.75, 240.33, 0.142453 //first scan of C
024_S_1171, 659.50, 242.21, 0.141269 //second scan of C
...
But I want it to look like that:
ID, CC_area, CC_perimeter, CC_circularity, CC_area2, CC_perimeter2, CC_circularity2, CC_area3, CC_perimeter3, CC_circularity3, ..., CC_circularity6
024_S_0985, 407.00, 192.15, 0.138530, 437.50, 204.80, 0.131074, 400.75, 198.80, 0.127420, ... ,
024_S_1063, 544.50, 214.34, 0.148939,,,,,, ...,
024_S_1171, 654.75, 240.33, 0.142453, 659.50, 242.21, 0.141269,,, ... ,
...
What is important, that order of data must not be changed and number of rows for one ID is not known (it varies from 1 to 6). (So first columns of scan 1, then scan 2 etc.). Could you help me, or provide, with solution for that using bash? I am not experienced in programming and I have lost hope, that I could do it myself.
Upvotes: 0
Views: 55
Reputation: 84561
You can combine the line with the same filename (or initial index) using a normal while read
loop and then acting on 3 conditions. (1) whether it is the first line following the header; (2) where the current index is equal to the last; and (3) where the current index differs from the last. There are a number of ways to approach this, but a short bash script could look like the following:
#!/bin/bash
fn="${1:-/dev/stdin}" ## accept filename or stdin
[ -r "$fn" ] || { ## validate file is readable
printf "error: file not found: '%s'\n" "$fn"
exit 1
}
declare -i cnt=0 ## flag for 1st iteration
while read -r line; do ## for each line in file
## read header, print & continue
[ ${line//,*/} = ID ] && printf "%s\n" "$line" && continue
line="${line// */}" ## strip //first scan of A....
idx=${line//,*/} ## parse file index from line
line="${line#*, }" ## strip index
if [ $cnt -eq 0 ]; then ## if first line - print
printf "%s, %s" "$idx" "$line"
((cnt++))
elif [ $idx = $lidx ]; then ## if indexes equal, append
printf ", %s" "$line"
else ## else, newline & print
printf "\n%s, %s" "$idx" "$line"
fi
last="$line" ## save last line
lidx=$idx ## save last index
done <"$fn"
printf "\n"
Input
$ cat dat/cmbcsv.dat
ID, CC_area, CC_perimeter, CC_circularity
024_S_0985, 407.00, 192.15, 0.138530 //first scan of A
024_S_0985, 437.50, 204.80, 0.131074 //second scan of A
024_S_0985, 400.75, 198.80, 0.127420 //third scan of A
024_S_1063, 544.50, 214.34, 0.148939 //first and only scan of B
024_S_1171, 654.75, 240.33, 0.142453 //first scan of C
024_S_1171, 659.50, 242.21, 0.141269 //second scan of C
Output
$ bash cmbcsv.sh dat/cmbcsv.dat
ID, CC_area, CC_perimeter, CC_circularity
024_S_0985, 407.00, 192.15, 0.138530, 437.50, 204.80, 0.131074, 400.75, 198.80, 0.127420
024_S_1063, 544.50, 214.34, 0.148939
024_S_1171, 654.75, 240.33, 0.142453, 659.50, 242.21, 0.141269
Note: I didn't know whether you needed all the additional commas or ellipses or if they were just there to show there could be more of the same index (e.g. ,,...,
). You can easily add them if need be.
Upvotes: 1
Reputation: 132
well if you know which scan belongs to which person you can add an extra column like patient name or id, but I guess that's if you have that original info of how much scans per person
Upvotes: 0