aandroidtest
aandroidtest

Reputation: 1503

Python Converting tab limited file into csv

I basically want to convert tab delimited text file http://www.linux-usb.org/usb.ids into a csv file.

I tried importing using Excel, but it is not optimal, it turns out like:

8087  Intel Corp.
                   0020  Integrated Rate Matching Hub
                   0024  Integrated Rate Matching Hub

How I want it so for easy searching is:

8087  Intel Corp.    0020  Integrated Rate Matching Hub
8087  Intel Corp.    0024  Integrated Rate Matching Hub

Is there any ways I can do this in python?

Upvotes: 0

Views: 248

Answers (3)

JimJ0hns0n
JimJ0hns0n

Reputation: 21

$ListDirectory = "C:\USB_List.csv"

Invoke-WebRequest 'http://www.linux-usb.org/usb.ids' -OutFile $ListDirectory

$pageContents = Get-Content $ListDirectory | Select-Object -Skip 22

"vendor`tvendor_name`tproduct`tproduct_name`r" > $ListDirectory

#Variables and Flags
$currentVid
$currentVName
$currentPid
$currentPName
$vendorDone = $TRUE
$interfaceFlag = $FALSE
$nextline
$tab = "`t"

foreach($line in $pageContents){

    if($line.StartsWith("`#")){
        continue
    }
    elseif($line.length -eq 0){
        exit
    } 

    if(!($line.StartsWith($tab)) -and ($vendorDone -eq $TRUE)){
        $vendorDone = $FALSE
    }

    if(!($line.StartsWith($tab)) -and ($vendorDone -eq $FALSE)){
        $pos = $line.IndexOf("  ")
        $currentVid = $line.Substring(0, $pos)
        $currentVName = $line.Substring($pos+2)
        "$currentVid`t$currentVName`t`t`r" >> $ListDirectory
        $vendorDone = $TRUE
    }
    elseif ($line.StartsWith($tab)){

        if ($interfaceFlag -eq $TRUE){
            $interfaceFlag = $FALSE
        }
        $nextline = $line.TrimStart()
        if ($nextline.StartsWith($tab)){
            $interfaceFlag = $TRUE
        }
        if ($interfaceFlag -eq $FALSE){
            $pos = $nextline.IndexOf("  ")
            $currentPid = $nextline.Substring(0, $pos)
            $currentPName = $nextline.Substring($pos+2)
            "$currentVid`t$currentVName`t$currentPid`t$currentPName`r" >> $ListDirectory
            Write-Host "$currentVid`t$currentVName`t$currentPid`t$currentPName`r"
            $interfaceFlag = $FALSE
        }
    } 
}

I know the ask is for python, but I built this PowerShell script to do the job. It takes no parameters. Just run as admin from the directory where you want to store the script. The script collects everything from the http://www.linux-usb.org/usb.ids page, parses the data and writes it to a tab delimited file. You can then open the file in excel as a tab delimited file. Ensure the columns are read as "text" and not "general" and you're go to go. :)

Parsing this page is tricky because the script has to be contextually aware of every VID-Vendor line proceeding a series of PID-Product lines. I also forced the script to ignore the commented description section, the interface-interface_name lines, the random comments that he inserted throughout the USB list (sigh) and everything after and including "#List of known device classes, subclasses and protocols" which is out of scope for this request.

I hope this helps!

Upvotes: 2

Andrew Charlton
Andrew Charlton

Reputation: 63

Something like this would work:

import csv

lines = []

with open("usb.ids.txt") as f:
    reader = csv.reader(f, delimiter="\t")

    device = ""
    for line in reader:

        # Ignore empty lines and comments
        if len(line) == 0 or (len(line[0]) > 0 and line[0][0] == "#"):
            continue

        if line[0] != "":
            device = line[0]

        elif line[1] != "":
            lines.append((device, line[1]))


print(lines)

You basically need to loop through each line, and if it's a device line, remember that for the following lines. This will only work for two columns, and you would then need to write them all to a csv file but that's easy enough

Upvotes: 0

Tommy
Tommy

Reputation: 622

You just need to write a little program that scans in the data a line at a time. Then it should check to see if the first character is a tab ('\t'). If not then that value should be stored. If it does start with tab then print out the value that was previously stored followed by the current line. The result will be the list in the format you want.

Upvotes: 1

Related Questions