Tero Isotalo
Tero Isotalo

Reputation: 63

Select CSV columns in Powershell where header name contains a specific string

I have a data file of about 10-15 columns from which I want to extract specific columns. Some of the columns I know the exact column header and others I only know that the first two letters will always be "FC". How do I select only the columns where I know the column header and those that start with "FC"? Starting with just the "FC" columns, I have tried like this:

$myCSV = Import-CSV "mydata.txt" -Delimiter "`t"
$FCcols = $myCSV[0].psobject.Properties | foreach { $_.Name } | Where {$_ -match "FC"}
$myCSV | select $FCcols

But I just get an error:

Select-Object : Cannot convert System.Management.Automation.PSObject to one of 
the following types {System.String, System.Management.Automation.ScriptBlock}.
At line:3 char:16
+ $myCSV | select <<<<  $FCcols
    + CategoryInfo          : InvalidArgument: (:) [Select-Object], NotSupport 
   edException
    + FullyQualifiedErrorId : DictionaryKeyUnknownType,Microsoft.PowerShell.Co 
   mmands.SelectObjectCommand

Then, if I try:

$myCSV = Import-CSV "mydata.txt" -Delimiter "`t"
$FCcols = [System.Collections.ArrayList]@()
$myCSV[0].psobject.Properties | foreach { $_.Name } | Where {$_ -match "FC"} | %{$FCcols.Add($_)}
$myCSV | select $FCcols

I get the output I want except that it is in "column header : value" format, like this:

FC1839 : 0
FC1842 : 1
FC1843 : 6
FC1844 : 12
FC1845 : 4

FC1839 : 0
FC1842 : 0
FC1843 : 19
FC1844 : 22
FC1845 : 14

I am probably just missing something simple, but how do I get to the point that I am able to select these matching columns and then output them to another .txt file (without the header : value format)?

Upvotes: 6

Views: 11272

Answers (4)

mklement0
mklement0

Reputation: 437953

First things first: Mathias R. Jessen's helpful tip not only solves your problem, but significantly simplifies the approach (and also works in PSv2):

$myCSV | Select-Object FC*

The (implied) -Property parameter supports wildcard expressions, so FC* matches all property (column names) that start with FC.

As for the output format you're seeing: Because you're selecting 5 properties, PowerShell defaults to implicit Format-List formatting, with each property name-value pair on its own line.

To fix this display problem, pipe to Format-Table explicitly (which is what PowerShell would do implicitly if you had selected 4 or fewer properties):

$myCSV | Select-Object FC* | Format-Table

To re-export the results to a CSV (TSV) file:

Import-Csv mydata.txt -Delimiter "`t" | Select-Object FC* | 
  Export-Csv myresults.txt -Encoding Utf8 -Delimiter "`t" -NoTypeInformation

To do so without a header line:

Import-Csv mydata.txt -Delimiter "`t" | Select-Object FC* | 
  ConvertTo-Csv -Delimiter "`t" -NoTypeInformation | Select-Object -Skip 1 |
    Set-Content myresults.txt -Encoding Utf8

As for your specific symptom:

The problem occurs only in PSv2, and it smells like a bug to me.

The workaround is make your column-name array a strongly typed string array ([string[]]):

[string[]] $FCcols = $myCSV[0].psobject.Properties | % { $_.Name } | ? { $_ -match '^FC' }

Note that, for brevity, I've used built-in alias % in lieu of ForEach-Object and ? in lieu of Where-Object.
Also note that the regex passed to -match was changed to ^FC to ensure that only columns that start with FC are matched.


Your code works as-is in PSv3+, but can be simplified:

$FCcols = $myCSV[0].psobject.Properties.Name -match "^FC"

Note how .Name is applied directly to .psobject.Properties, which in v3+ causes the .Name member to be invoked on each item of the collection, a feature called member-access enumeration.

Upvotes: 4

Tero Isotalo
Tero Isotalo

Reputation: 63

I finally came up with a "quick and dirty" solution which I'm disappointed to not have figured out earlier.

$myCSV = Import-CSV "mydata.txt" -Delimiter "`t" | select FC*
for ($i = 0; $i -lt $myCSV.count; $i++){
$writeline = ($myCSV[$i] | %{$_.PSObject.Properties | %{$_.Value}}) -join "`t"
ac "myresults.txt" $writeline -Encoding utf8}

The first line gives me the columns I want, then the for loop gets the value properties of each column and joins them as tabulated lines, finally each line is appended to a text file. This may not be the pedagogically correct way to achieve the result, but it works so far. Thanks to everyone for their input!

Upvotes: 0

Dave Sexton
Dave Sexton

Reputation: 11188

I would use Get-Member to get your columns, something like this:

$myCSV = Import-CSV "mydata.txt" -Delimiter "`t"
$myCSV | select ($myCSV | gm -MemberType NoteProperty | ? {$_.Name -match 'FC'}).Name

Upvotes: 2

G42
G42

Reputation: 10019

Mathias's helpful comment is best way to go for selecting; simple and elegant - dind't know it was an option.

$myCSV | Select *FC*,ColumnIKnowTheNameOf

I believe you need to add Export-Csv to answer your last question. Here's another approach I'd already worked on that makes use of Get-Member and NoteProperty if you need to interrogate csv/similar objects in future.

$myCSV = Import-CSV "mydata.txt" -Delimiter "`t"

# you can get the headings by using Get-Member and Selecting NoteProperty members.
$FCcols = $myCSV |
            Get-Member |
            Where-Object {$_.MemberType -eq "NoteProperty" -and $_.Name -match "FC"} |
            Select-Object -ExpandProperty Name

# you add names to this array.
$FCcols += "ColumnIKnowTheNameOf"

$myCSV | Select-Object $FCcols

# to get a tab-delimited file similar to the one you imported, use Export-Csv
$myCSV | Export-csv "myresults.txt" -Delimiter "`t" -NoTypeInformation

Upvotes: 0

Related Questions