user2948166
user2948166

Reputation: 629

Append new data to a dataframe

I have a csv file with many columns but for simplicity I am explaining the problem using only 3 columns. The column names are 'user', 'A' and 'B'. I have read the file using the read_csv function in pandas. The data is stored as a data frame.

Now I want to remove some rows in this dataframe based on their values. So if value in column A is not equal to a and column B is not equal to b I want to skip those user rows.

The problem is I want to dynamically create a dataframe to which I can append one row at a time. Also I do not know the number of rows that there would be. Therefore, I cannot specify the index when defining the dataframe.

I am using the following code:

import pandas as pd

header=['user','A','B']
userdata=pd.read_csv('.../path/to/file.csv',sep='\t', usecols=header);

df = pd.DataFrame(columns=header)

for index, row in userdata.iterrows():
       if row['A']!='a' and row['B']!='b':
       data= {'user' : row['user'], 'A' : row['A'], 'B' : row['B']}
       df.append(data,ignore_index=True)

The 'data' is being populated properly but I am not able to append. At the end, df comes to be empty.

Any help would be appreciated.

Thank you in advance.

Upvotes: 2

Views: 2792

Answers (1)

chrisaycock
chrisaycock

Reputation: 37930

Regarding your immediate problem, append() doesn't modify the DataFrame; it returns a new one. So you would have to reassign df via:

df = df.append(data,ignore_index=True)

But a better solution would be to avoid iteration altogether and simply query for the rows you want. For example:

df = userdata.query('A != "a" and B != "b"')

Upvotes: 1

Related Questions