Chargaff
Chargaff

Reputation: 1572

Printing first field and (and only) matching fields in record, using awk

I really don't know if awk would be the appropriate tool for that task... Maybe something in python would be better. Anyway, I thought asking here first for the feasibility of the task. Here we go :

Datas :

###

offspr84 175177 200172 312312 310326 338342 252240 226210 113129 223264
male28 197175 172200 308312 310338 262338 256252 190226 113129 223219
female13 197177 172172 312308 318326 342350 240248 210218 129113 267247

###

offspr85 181177 192160 320312 290362 358330 238238 214178 133129 263223
male65 197181 176192 320268 322286 358330 238244 206214 137133 267263
female17 181177 160172 280312 362346 350326 230238 126178 129129 223167

###

So basicaly I need to print the first field ($1) and matching (in bold) $9 in the first record and matching $2 and $6 in second record.

Output file :
offspr84 113129
male28 113129

offspr85 181177
female17 181177

offspr85 358330
male65 358330

Any hint on how I could accomplish that ?

Thanx !

Upvotes: 0

Views: 571

Answers (4)

glenn jackman
glenn jackman

Reputation: 246764

awk '
    /^offspr/ {
        for (i=1; i<=NF; i++) {
            offspr[i] = $i
        }
        next
    }
    {
        for (i=2; i<=NF; i++) {
            if ($i == offspr[i]) {
                print offspr[1] " " offspr[i]
                print $1 " " $i
                print ""
                break
            }
        }
    }
'

Upvotes: 0

Nicolas Paris
Nicolas Paris

Reputation: 134

This code will produce the output you want. Maybe not the best way, but seems to work as expected.

#data = [
    #'offspr84 175177 200172 312312 310326 338342 252240 226210 113129 223264',
    #'male28 197175 172200 308312 310338 262338 256252 190226 113129 223219',
    #'female13 197177 172172 312308 318326 342350 240248 210218 129113 267247']

data = [
'offspr85 181177 192160 320312 290362 358330 238238 214178 133129 263223',
'male65 197181 176192 320268 322286 358330 238244 206214 137133 267263',
'female17 181177 160172 280312 362346 350326 230238 126178 129129 223167' ]

for i, line in enumerate(data):
    data[i] = line.split(' ')

for item in data[0]:
    if data[1].count(item) > 0:
        print data[0][0], item
        print data[1][0], item

    if data[2].count(item) > 0:
        print data[0][0], item
        print data[2][0], item

Update:

With a nested list to include both list at once:

datas = [[
'offspr85 181177 192160 320312 290362 358330 238238 214178 133129 263223',
'male65 197181 176192 320268 322286 358330 238244 206214 137133 267263',
'female17 181177 160172 280312 362346 350326 230238 126178 129129 223167' ],
[
'offspr84 175177 200172 312312 310326 338342 252240 226210 113129 223264',
'male28 197175 172200 308312 310338 262338 256252 190226 113129 223219',
'female13 197177 172172 312308 318326 342350 240248 210218 129113 267247']
]
for data in datas:
    for i, line in enumerate(data):
        data[i] = line.split(' ')


for data in datas:
    for item in data[0]:
        if data[1].count(item) > 0:
            print data[0][0], item
            print data[1][0], item

        if data[2].count(item) > 0:
            print data[0][0], item
            print data[2][0], item

Upvotes: 1

Kent
Kent

Reputation: 195039

try this awk code

 awk '/###/{i++;next}
i==1{if($0~/offspr84/){
        a=$9;n=$1;next;
}

if($9==a){print n,a;print $1,$9}}
        i==2{if($0~/offspr85/){
        m=$1;p=$2;q=$6;next;}
        if($2==p){print m,p;print $1,p}
        if($6==q){print m,q;print $1,q}
}' yourFile

Upvotes: 0

keis
keis

Reputation: 113

I'm not entirely sure on how you want the matching to work. but assuming the same pattern is applied to all fields, you can easily do this by looping over the fields e.g

{
    for(i=2; i<=NF; i++) {
        if (match($i, "some regexp")) {
            print $1 $i
        }
    }
}

Upvotes: 0

Related Questions