Ansonparkour
Ansonparkour

Reputation: 49

How to read data by line and return a dataframe

read data from line by line,

for line in sys.stdin:

    print(line)

the each line input is following:

New York 100
Orlando 200
LA 300
D.C. 400

the output I want is a dataframe:

         city     value
    0  New York    100
    1   Orlando    200
    2        LA    300
    3      D.C.    400

the way I am doing is read the line and save all lines as a list of list, where each line content is a list

list_of_lists = []
for line in sys.stdin:
        new_list = [elem for elem in line.split()]
        list_of_lists.append(new_list)

and then convert this list_of_lists to a DataFrame.

I feel this way is pretty stupid, so I am wondering if there is any other way. Thanks.

Upvotes: 1

Views: 12670

Answers (2)

piRSquared
piRSquared

Reputation: 294576

Use str.rsplit to split from the right side and only one time

list_of_lists = []
for line in sys.stdin:
        new_list = line.rsplit(1)
        list_of_lists.append(new_list)

Or, put into a pandas series first

import sys, re, pandas as pd

data = sys.stdin.read().splitlines()

pd.Series(data, name='A').str.rsplit(n=1, expand=True)

Upvotes: 0

RomanPerekhrest
RomanPerekhrest

Reputation: 92904

import sys, re, pandas as pd

data = sys.stdin.read().splitlines()   # obtaining the list of lines from stdin
data = [re.split(r'\s+(?=\d+$)', l) for l in data]  # split each line into 2 items: `city` and `value`
df = pd.DataFrame(data, columns=['city','value'])   # constructing dataframe

print(df)

The output:

       city value
0  New York   100
1   Orlando   200
2        LA   300
3      D.C.   400

Upvotes: 2

Related Questions