Importing and manipulating a .txt in mathematica

Question

I have a huge text file that I import to mathematica. It looks something like this:

    In[9]:=import=SplitBy[Import["textfile.txt","List"],"\t"];

    Out[9]:={{  A   021 2.3 A   002 2.6},{  A   012 2.3 A   001 2.6},{  A   120 2.6 A   111 2.9},{  A   122 2.8 A   121 2.8},{  A   000 1.3 A   121 2.9},{  A   110 2.4 A   111 2.9},{  G   010 2.3 G   001 2.6},{  G   000 2.2 G   001 2.3 G   010 2.4},{  G   010 2.3 G   001 2.6},{  G   110 2.3 G   101 2.6}}

EDIT: note that all elements are separated by a \t character.

This is a list of strings such that

    In[12]:= Head@import
    Head@import[[1]]
    Head@import[[All, 1]]
    Head@import[[1, 1]]

    Out[12]= List
    Out[13]= List
    Out[14]= List
    Out[15]= String

My big problem is in converting this list to a manageable list of elements, such that I can search for those where G is present, not where A is present. I tried replacing the parts of the strings where was present by a . But then I still cant treat the data as I wanted because I doesn't allow me to search for individual G elements. Ideally, what I wanted to get in the end would be

    {{G,010,2.3},{G,001,2.6},{G,000,2.2},{G,001,2.3},{G,010,2.4},{G,010,2.3},{G,001,2.6},{G,110,2.3},{G,101,2.6}}

I already know that I will have to use the Take command, the Partition command to split the sublist in sublists of 3 elements, and so on. But because Im not even able to get the data in a list of lists, I cant make this..

Further, when importing I have to select the "List" type. If I imported as "Table" everything would be already half done, but the elements "001" would become "1".

Could you guys help me out please? All help is appreciated! Thanks

High Performance Mark · Accepted Answer

I don't have Mathematica on this machine, so my syntax may be a bit awry.

Doesn't

niceList = Partition[Flatten[import],3]

produce a list of lists, where each list at the inner level comprises 3 strings ? Then, something like

Select[niceList,#[[1]]=="G"&]

should select the sub-lists which have a "G" as the first element.

EDIT

If I understand you now, you mean that in your variable import you have a list of lists, each of the lower level lists, such as

{  A   021 2.3 A   002 2.6}

contains a single string ? In other words

FullForm[  A   021 2.3 A   002 2.6]

returns

"  A   021 2.3 A   002 2.6"

I would import the data, replace all the tab characters with spaces and then use StringSplit[] (at the right level) to turn each string into a list of strings. Then Flatten, Partition, etc. You might find it easiest to start by importing the entire contents of the file into a single string at first

Importing and manipulating a .txt in mathematica

Answers (2)

Related Questions