Reputation: 45
I have a table called baitmap.txt. In $5, different gene names are separated by ",".
baitmap.txt
1 831895 848168 218 RP11-54O7.1-001
1 848169 850618 219 RP11-54O7.2-001
1 850619 874081 220 SAMD11-011,SAMD11-003,SAMD11-010,SAMD11-001,SAMD11-004
1 889424 903640 223 NOC2L-001
1 903641 927394 224 C1orf170-001,C1orf170-201
1 927395 936954 225 HES4-002,HES4-001,HES4-004
1 943677 957199 228 RP11-54O7.11-001
1 1005127 1034268 234 RNF223-201
1 1049052 1062659 239 C1orf159-002,C1orf159-001,C1orf159-004,C1orf159-009,C1orf159-011,C1orf159-017,C1orf159-016,C1orf159-203,C1orf159-201,C1orf159-204,C1orf159-202
1 1096739 1107115 246 MIR200B-201
I would like to split the names in $5 then merge the new column to the previous 4 columns separately. So it should look like this:
1 831895 848168 218 RP11-54O7.1-001
1 848169 850618 219 RP11-54O7.2-001
1 850619 874081 220 SAMD11-011
1 850619 874081 220 SAMD11-003
1 850619 874081 220 SAMD11-010
1 850619 874081 220 SAMD11-001
1 850619 874081 220 SAMD11-004
1 889424 903640 223 NOC2L-001
1 903641 927394 224 C1orf170-001
1 903641 927394 224 C1orf170-201
Upvotes: 2
Views: 67
Reputation: 7277
Another bash variant
while read line; do
first="${line% *}"
last="${line##* }"
printf "$first ${last//,/\\n$first }\n"
done < baitmap.txt
Upvotes: 0
Reputation: 212248
Since this question was originally tagged C
, it seems appropriate to do:
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
static void
process_line(const char *line)
{
const char *b = line;
int p = 1;
for(int field = 0; *b; b++) {
if( p && !isspace(*b) && ++field == 5 ) {
break;
}
p = isspace(*b);
}
int len = b - line;
const char *s = b;
for( ; *b; b++ ) {
if( *b == ',' || *b == '\n' ) {
fwrite(line, 1, len, stdout);
fwrite(s, 1, b - s, stdout);
putchar('\n');
s = b + 1;
}
}
}
int
main(int argc, char *const argv[])
{
(void)argc;
(void)argv;
char *line = NULL;
size_t s;
while( getline(&line, &s, stdin) != -1 ) {
process_line(line);
}
free(line);
return 0;
}
Upvotes: 0
Reputation: 246799
With plain bash:
while read -ra fields; do
# split the last field into the "values" array
IFS=, read -ra values <<< "${fields[-1]}"
for val in "${values[@]}"; do
fields[-1]=$val
echo "${fields[*]}"
done
done < baitmap.txt
This requires bash v4.3+ to assign to fields[-1]
. To work with an older bash (such as /bin/bash on a Mac):
while read -ra fields; do
len=${#fields[@]}
IFS=, read -ra values <<< "${fields[len-1]}"
for val in "${values[@]}"; do
fields[len-1]=$val
echo "${fields[*]}"
done
done < baitmap.txt
Upvotes: 2
Reputation: 212248
awk '{split($5, a, ","); $5=""; for( k in a) print $0, a[k]}' baitmap.txt
Note that awk hashes the array so that will permute the order. If that matters, you can do:
awk '{n = split($5, a, ","); $5=""; for(k=1; k<=n; k++) print $0, a[k]}' baitmap.txt
Upvotes: 5
Reputation: 64682
I did this in a few lines of BAT (Dos / Windows only)
@echo off
for /F "tokens=1-5" %%a in (baitmap.txt) do (
for %%i in (%%e) do (
echo %%a %%b %%c %%d %%i
)
)
1 831895 848168 218 RP11-54O7.1-001
1 848169 850618 219 RP11-54O7.2-001
1 850619 874081 220 SAMD11-011
1 850619 874081 220 SAMD11-003
1 850619 874081 220 SAMD11-010
1 850619 874081 220 SAMD11-001
1 850619 874081 220 SAMD11-004
1 889424 903640 223 NOC2L-001
1 903641 927394 224 C1orf170-001
1 903641 927394 224 C1orf170-201
1 927395 936954 225 HES4-002
1 927395 936954 225 HES4-001
1 927395 936954 225 HES4-004
1 943677 957199 228 RP11-54O7.11-001
1 1005127 1034268 234 RNF223-201
1 1049052 1062659 239 C1orf159-002
1 1049052 1062659 239 C1orf159-001
1 1049052 1062659 239 C1orf159-004
1 1049052 1062659 239 C1orf159-009
1 1049052 1062659 239 C1orf159-011
1 1049052 1062659 239 C1orf159-017
1 1049052 1062659 239 C1orf159-016
1 1049052 1062659 239 C1orf159-203
1 1049052 1062659 239 C1orf159-201
1 1049052 1062659 239 C1orf159-204
1 1049052 1062659 239 C1orf159-202
1 1096739 1107115 246 MIR200B-201
Upvotes: 0