Sos
Sos

Reputation: 1949

Importing and manipulating a .txt in mathematica

I have a huge text file that I import to mathematica. It looks something like this:

    In[9]:=import=SplitBy[Import["textfile.txt","List"],"\\t"];

    Out[9]:={{  A   021 2.3 A   002 2.6},{  A   012 2.3 A   001 2.6},{  A   120 2.6 A   111 2.9},{  A   122 2.8 A   121 2.8},{  A   000 1.3 A   121 2.9},{  A   110 2.4 A   111 2.9},{  G   010 2.3 G   001 2.6},{  G   000 2.2 G   001 2.3 G   010 2.4},{  G   010 2.3 G   001 2.6},{  G   110 2.3 G   101 2.6}}

EDIT: note that all elements are separated by a \\t character.

This is a list of strings such that

    In[12]:= Head@import
    Head@import[[1]]
    Head@import[[All, 1]]
    Head@import[[1, 1]]

    Out[12]= List
    Out[13]= List
    Out[14]= List
    Out[15]= String

My big problem is in converting this list to a manageable list of elements, such that I can search for those where G is present, not where A is present. I tried replacing the parts of the strings where was present by a . But then I still cant treat the data as I wanted because I doesn't allow me to search for individual G elements. Ideally, what I wanted to get in the end would be

    {{G,010,2.3},{G,001,2.6},{G,000,2.2},{G,001,2.3},{G,010,2.4},{G,010,2.3},{G,001,2.6},{G,110,2.3},{G,101,2.6}}

I already know that I will have to use the Take command, the Partition command to split the sublist in sublists of 3 elements, and so on. But because Im not even able to get the data in a list of lists, I cant make this..

Further, when importing I have to select the "List" type. If I imported as "Table" everything would be already half done, but the elements "001" would become "1".

Could you guys help me out please? All help is appreciated! Thanks

Upvotes: 0

Views: 1840

Answers (2)

High Performance Mark
High Performance Mark

Reputation: 78316

I don't have Mathematica on this machine, so my syntax may be a bit awry.

Doesn't

niceList = Partition[Flatten[import],3]

produce a list of lists, where each list at the inner level comprises 3 strings ? Then, something like

Select[niceList,#[[1]]=="G"&]

should select the sub-lists which have a "G" as the first element.

EDIT

If I understand you now, you mean that in your variable import you have a list of lists, each of the lower level lists, such as

{  A   021 2.3 A   002 2.6}

contains a single string ? In other words

FullForm[  A   021 2.3 A   002 2.6]

returns

"  A   021 2.3 A   002 2.6"

I would import the data, replace all the tab characters with spaces and then use StringSplit[] (at the right level) to turn each string into a list of strings. Then Flatten, Partition, etc. You might find it easiest to start by importing the entire contents of the file into a single string at first

Upvotes: 2

Mr.Wizard
Mr.Wizard

Reputation: 24336

In the future it would be very helpful if you could include a sample of the actual file you are importing. Nevertheless I believe I can guess the format of the file with sufficient accuracy to recommend this:

data = ReadList["textfile.txt", {Word, Number, Number}]

If the file is in the format that I hope it should return:

{{"A", 21, 2.3}, {"A", 2, 2.6}, {"A", 12, 2.3}, {"A", 1, 2.6}, {"A", 
  120, 2.6}, {"A", 111, 2.9}, {"A", 122, 2.8}, {"A", 121, 2.8}, {"A", 
  0, 1.3}, {"A", 121, 2.9}, {"A", 110, 2.4}, {"A", 111, 2.9}, {"G", 
  10, 2.3}, {"G", 1, 2.6}, {"G", 0, 2.2}, {"G", 1, 2.3}, {"G", 10, 
  2.4}, {"G", 10, 2.3}, {"G", 1, 2.6}, {"G", 110, 2.3}, {"G", 101, 
  2.6}}

From there getting records that start with "G" can be done with any of these at your preference:

Cases[data, {"G", ___}]

Select[data, "G" === #[[1]] &]

Pick[data, First /@ data, "G"]

Upvotes: 4

Related Questions