timmalos
timmalos

Reputation: 542

Join two files in powershell

Really need help on this :(I 'll try to be as simple as possible.

I got one big file looking like this:

ID,Info1,Info2,info3,...

On each line, i got one ID and a lot of stuff, comma separated. There can be > 3000 lines.

Now i got a second file like this :

ID,Info4,Info5,Info6,...

The first file contains ALL the elements whereas the second file contains only some of them.

For example, first one:

BLA1,some stuff...
BLA2,some stuff...
BLA3,some stuff...
ALO1,some stuff...
ALO2,some stuff...

And the second one :¨

BLA3,some stuff2... 
ALO1,some stuff2...
BLA1,some stuff2... 

What i want is simple, I want to append all the 'some stuff2...' of the second file to the first one like a join type=left with sql

I want the first file to have now :

BLA1,some stuff...,some stuff2...
BLA2,some stuff...
BLA3,some stuff...,some stuff2...
ALO1,some stuff...,some stuff2...
ALO2,some stuff...

I tried something like this :

ForEach ($line in $file1) {
    $colA = $line.Split(',')
    ForEach ($line in $file2) {
        $colB = $line.Split(',')
        if($colA[0]-eq $colB[0]) { #Item found in file2
            $out += $date + $colA[1]+","+ ... +","+ $colB[1]+","+ ... +"`n"
        }else { 
            $out += $date + $colA[1]+","+ ... +"`n"
        }
    }
}

But it takes so much time it dosnt success (and maybe there were other problems i didnt see). What's the best way? a 2D Array? I could try to sort the IDs and then script a little, but as its not numerical only i don't know how to process.

Thks a lot guys for your help,

Upvotes: 0

Views: 524

Answers (3)

Ansgar Wiechers
Ansgar Wiechers

Reputation: 200293

If you want to do a LEFT JOIN, you could load the files into a temporary database and actually do a LEFT JOIN. See here for an example using SQLite.

Upvotes: 0

Keith Hill
Keith Hill

Reputation: 201652

Use a hashtable where the key is the ID.

$ht = [ordered]@{}
foreach ($line in $file1) {
    $id,$rest = $line -split ',',2
    $ht[$id] = $line
}
foreach ($line in $file2) {
    $id,$rest = $line -split ',',2
    if ($ht.ContainsKey($id)) {
        $ht[$id] += ",$rest"
    }
    else {
        $ht[$id] = $line
    }
}
$ht.Values > newfile.txt

Upvotes: 2

philselmer
philselmer

Reputation: 751

I went with the assumption that you either have known header lines or can add them...

f1.csv

Name,Item_1
BLA1,thing_bla1_1
ALB1,thing_alb1_1
BLA2,thing_bla2_1
ALB2,thing_alb2_1
BLA3,thing_bla3_1
ALB3,thing_alb3_1

f2.csv

Name,Item_2
BLA3,thing_bla3_2
ALB3,thing_alb3_2
BLA1,thing_bla1_2
ALB1,thing_alb1_2
BLA2,thing_bla2_2
ALB2,thing_alb2_2

Code:

$grouped = Import-Csv .\f1.csv, .\f2.csv | group -property Name -ashashtable

$($grouped.Keys | foreach {$obj = $grouped.Item("$_")[0].Name + "," + $grouped.Item("$_")[0].Item_1 + "," + $grouped.Item("$_")[1].Item_2; $obj}) | Out-File .\test.csv

What we are doing here is importing the two CSVs into one element, then grouping the items of the same name in the hash table. Then we pipe the keys (the non-duplicated names from the files) into a foreach that combines them into one line. We need the $() around those statements to allow the output to be piped to Out-File.

I'm nearly positive that there is a cleaner way to do the inside of the foreach, but this does work.

The output (text.csv):

ALB1,thing_alb1_1,thing_alb1_2
BLA2,thing_bla2_1,thing_bla2_2
ALB3,thing_alb3_1,thing_alb3_2
BLA1,thing_bla1_1,thing_bla1_2
ALB2,thing_alb2_1,thing_alb2_2
BLA3,thing_bla3_1,thing_bla3_2

Upvotes: 1

Related Questions