Reputation: 2007
I have a text file containing simulation data (60 columns, 100k rows):
a b c
1 11 111
2 22 222
3 33 333
4 44 444
... where in the first row are variable names, and beneath (in columns) is the corresponding data (float type).
I need to use all these variables with their data in Python for further calculations. For example, when I insert:
print(b)
I need to receive the values from the second column.
I know how to import data:
data=np.genfromtxt("1.txt", unpack=True, skiprows = 1)
Assign variables "manually":
a,b,c=np.genfromtxt("1.txt", unpack=True, skiprows = 1)
But I'm having trouble with getting variable names:
reader = csv.reader(open("1.txt", "rt"))
for row in reader:
list.append(row)
variables=(list[0])
How can I change this code to get all variable names from the first row and assign them to the imported arrays ?
Upvotes: 4
Views: 77633
Reputation: 1055
Here is a simple way to convert a .txt file of variable names and data to NumPy arrays.
D = np.genfromtxt('1.txt',dtype='str') # load the data in as strings
D_data = np.asarray(D[1::,:],dtype=float) # convert the data to floats
D_names = D[0,:] # save a list of the variable names
for i in range(len(D_names)):
key = D_names[i] # define the key for this variable
val = D_data[:,i] # set the value for this variable
exec(key + '=val') # build the variable code here
I like this method because it is easy to follow and simple to maintain. We can compact this code as follows:
D = np.genfromtxt('1.txt',dtype='str') # load the data in as strings
for i in range(D.shape[1]):
val = np.asarray(D[1::,i],dtype=float) # set the value for this variable
exec(D[0,i] + '=val') # build the variable
Both codes do the same thing, return NumPy arrays named a,b, and c with their associated data.
Upvotes: 0
Reputation: 2007
Thanks to @andyg0808 and @Zero Piraeus I have found another solution. For me, the most appropriate - using Pandas Data Analysis Library.
import pandas as pd
data=pd.read_csv("1.txt",
delim_whitespace=True,
skipinitialspace=True)
result=data["a"]*data["b"]*3
print(result)
0 33
1 132
2 297
3 528
...where 0,1,2,3 are the row index.
Upvotes: 0
Reputation: 1403
Instead of trying to assign names, you might think about using an associative array, which is known in Python as a dict
, to store your variables and their values. The code could then look something like this (borrowing liberally from the csv
docs):
import csv
with open('1.txt', 'rt') as f:
reader = csv.reader(f, delimiter=' ', skipinitialspace=True)
lineData = list()
cols = next(reader)
print(cols)
for col in cols:
# Create a list in lineData for each column of data.
lineData.append(list())
for line in reader:
for i in xrange(0, len(lineData)):
# Copy the data from the line into the correct columns.
lineData[i].append(line[i])
data = dict()
for i in xrange(0, len(cols)):
# Create each key in the dict with the data in its column.
data[cols[i]] = lineData[i]
print(data)
data
then contains each of your variables, which can be accessed via data['varname']
.
So, for example, you could do data['a']
to get the list ['1', '2', '3', '4']
given the input provided in your question.
I think trying to create names based on data in your document might be a rather awkward way to do this, compared to the dict-based method shown above. If you really want to do that, though, you might look into reflection in Python (a subject I don't really know anything about).
Upvotes: 3
Reputation: 59148
The answer is: you don't want to do that.
Dictionaries are designed for exactly this purpose: the data structure you actually want is going to be something like:
data = {
"a": [1, 2, 3, 4],
"b": [11, 22, 33, 44],
"c": [111, 222, 333, 444],
}
... which you can then easily access using e.g. data["a"]
.
It's possible to do what you want, but the usual way is a hack which relies on the fact that Python uses (drumroll) a dict
internally to store variables - and since your code won't know the names of those variables, you'll be stuck using dictionary access to get at them as well ... so you might as well just use a dictionary in the first place.
It's worth pointing out that this is deliberately made difficult in Python, because if your code doesn't know the names of your variables, they are by definition data rather than logic, and should be treated as such.
In case you aren't convinced yet, here's a good article on this subject:
Stupid Python Ideas: Why you don't want to dynamically create variables
Upvotes: 2