MlolM
MlolM

Reputation: 247

Using chunksize to select data but keeping the same order?

Below is the program I use chunksize to select data from a database.

# Give my large required list
subset = pd.read_csv(required_list, index_col=[0], low_memory=False).index.unique()

# give my large database, it would select data based on column name
tp = pd.read_csv(Database,iterator=True, chunksize=1000, usecols=subset, low_memory=False) 
df = pd.concat(tp, ignore_index=True)

df.to_csv(OutputFile,iterator=True, chunksize=1000)

But when I run the program, the order of data in output file will be changed .

For example.

# Required_list, giving the column name that I want to select.
2
3
1

# Database
1 2 3 4 5  
a b c d e 

# OutputFile. The order is 1, 2, 3, not 2, 3, 1.
1 2 3 
a b c 

# But I want the output file to follow the same order as requried_list
2 3 1 
b c a

So my question is, how could I revise the program to select the data but still keep the same order as required_list ? The function iterator and chunksize are needed as the data is quite large.

Anyone can help?

Upvotes: 1

Views: 198

Answers (1)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210882

you can do it this way:

df = pd.concat(tp, ignore_index=True)[subset]

pd.concat(tp, ignore_index=True) returns a Data Frame, and df[list_of_cools] - returns a DataFrame with columns ordered as in the list_of_cools list

Upvotes: 1

Related Questions