Frantisek Farkas
Frantisek Farkas

Reputation: 3

parsing tab delimited values from text file to variables

Hello I've been struggling with this problem, I'm trying to iterate over rows and select data from them and then assign them to variables. this is the first time I'm using pandas and I'm not sure how to select the data

reader = pd.read_csv(file_path, sep="\t" ,lineterminator='\r', usecols=[0,1,2,9,10],)

for row in reader:
    print(row)
    #id_number = row[0]
    #name = row[2]
    #ip_address = row[1]
    #latitude = row[9]

and this is the output from the row that I want to assign to the variables:

050000

129.240.228.138

planetlab2.simula.no

59.93

Edit: Perhaps this is not a problem for pandas but for general Python. I am fairly new to python and what I'm trying to achieve is to parse tab separated file line by line and assign data to the variables and print them in one loop.

this is the input file sample:

050263  128.2.211.113   planetlab-1.cmcl.cs.cmu.edu NA  US  Allegheny County    Pittsburgh  http://www.cs.cmu.edu/  Carnegie Mellon University  40.4446 -79.9427    unknown
050264  128.2.211.115   planetlab-3.cmcl.cs.cmu.edu NA  US  Allegheny County    Pittsburgh  http://www.cs.cmu.edu/  Carnegie Mellon University  40.4446 -79.9427    unknown

Upvotes: 0

Views: 1970

Answers (3)

brad sanders
brad sanders

Reputation: 2539

Are the values you need to add the same for each row, or does it require processing the value to determine the value of the addition? If it is consistent you can apply this sum simply using pandas to do a matrix operation on the dataset. If it requires processing row by row, the above solution is the correct one for sure. If it is a table of variables that must be added row by row, you can do that by dumping them all into a column aligned with your dataset, do the addition by row using pandas, and simply print out the complete dataframe. Assume you have three columns to add, which you put into a new column[e].

df['e'] = df.a + df.b + df.d

or, if it is a constant:

df['e'] = df.a + df.b + {constant}

Then drop the columns you don't need (ex df['a'] and df['b'] in the above)

Obviously, then, if you need to calculate based on unique values for each row, put the values into another column and sum as above.

Upvotes: 0

Chris Menezes
Chris Menezes

Reputation: 71

The general workflow you're describing is: you want to read in a csv, find a row in the file with a certain ID, and unpack all the values from that row into variables. This is simple to do with pandas.

It looks like the CSV file has at least 10 columns in it. Providing the usecols arg should filter out the columns that you're not interested in, and read_csv will ignore them when loading into the pandas DataFrame object (which you've called reader).

Steps to do what you want:

  1. Read the data file using pd.read_csv(). You've already done this, but I recommend calling this variable df instead of reader, as read_csv returns a DataFrame object, not a Reader object. You'll also find it convenient to use the names argument to read_csv to assign column names to the dataframe. It looks like you want names=['id', 'ip_address', 'name', 'latitude','longitude'] to get those as columns. (Assuming col10 is longitude, which makes sense that 9,10 would be lat/long pairs)
  2. Query the dataframe object for the row with that ID that you're interested in. There are a variety of ways to do this. One is using the query syntax. Hard to know why you want that specific row without more details, but you can look up more information about index lookups in pandas. Example: row = df.query("id == 50000")
  3. Given a single row, you want to extract the row values into variables. This is easy if you've assigned column names to your dataframe. You can treat the row as a dictionary of values. E.g. lat = row['lat'] lon = row['long]

Upvotes: 1

ajmartin
ajmartin

Reputation: 2409

You can use iterrows():

df = pandas.read_csv(file_path, sep=',')
for index, row in df.iterrows():
    value = row['col_name']

Or if you want to access by index of the column:

df = pandas.read_csv(file_path, sep=',')
for index, row in df.iterrows():
    value = row.ix[0]

Upvotes: 0

Related Questions