Alex Brown
Alex Brown

Reputation: 3

Using x,y Coordinates to remove data from Pandas dataframe and insert into another Dataframe

I am trying to use pandas to search for data in a dataframe and then use the data collected to insert this data into specific positions on a new dataframe.

Hypothetically I would like my code to go along the lines of this:

If row contains [A] then 

    x=data.iloc[<row>, <column selection>]

    y=data.iloc[<row>, <column selection>]

    z=data.iloc[<row>, <column selection>]

insert x to newdataframe at location (y,z)

So I would like to search the row for a value and then if the value is present, return other values in the row by the newly found row and a predefined column selection.

Two of these values would then act as the x and y coordinates to place the z value into the new dataframe (the dataframe has already been made with correct index and columns which will match produced x and y values).

I've tried using a variety of techniques including numpy.where but to no avail yet. I'm quite new to python and I'm getting quite stuck figuring how to translate what I would like python to do into real code! I have tried converting my idea into real code but I think it becomes more difficult to explain what I'm trying to do, so I hope this makes sense.

I appreciate any help you can give!

Upvotes: 0

Views: 341

Answers (1)

FrancescoLS
FrancescoLS

Reputation: 366

Working with an example would be better, but let's try :)

"row contains [A]" is a bit vague in pandas, since each rows contains the values of all the columns at that row. So maybe you should think about it as "select the row in which column 'c' contains A"
You can do this with: data[ data['c']==A ]. This will return the subset of rows data in which the column 'c' has value A.

To give a more code-like form to what you wrote before:

for i, row in data[ data['c']==A ].iterrows():

    x = row['column_of_x']
    y = row['column_of_y']
    z = row['column_of_z']

After that what you wrote in the pseudo-code block is not consistent with the description. I'll stick to the description, in which x and y represent the location and z is the value.

In this context you should use something like new_data.loc[x,y] = z.

As I mentioned at the very top it would be good to have a minimal example of what you have and what you want to have. I feel like this problem can be easily addressed with pandas.DataFrame.groupby().

Upvotes: 0

Related Questions