Mirth Community

Mirth Community (http://www.mirthcorp.com/community/forums/index.php)
-   Support (http://www.mirthcorp.com/community/forums/forumdisplay.php?f=6)
-   -   Question re DB Reader Channel Parallelization (http://www.mirthcorp.com/community/forums/showthread.php?t=216089)

ahart 06-03-2016 09:44 AM

Question re DB Reader Channel Parallelization
 
I have a db reader channel "Patient Updater" developed in Mirth 3.2 that fetches n number of rows at a time when the channel polls. Then, for each message, we call a web service in one destination, and in a second destination we update the database. (I should add that the technical contact for the web service says they can easily handle 10 concurrent requests per second.) I don't care in which order they process, I just want to grab n rows and then bang through them as fast as possible, as long as the result of the web service call is accurately reflected in the next destination that updates the database.

I downloaded Mirth 3.4, loaded up my channels, global scripts, and code templates. I set the maximum number of processing threads to 10.

When I run it, all the lines in the log say "< pool-6-thread-1".

...

So, it seemed to me that I wasn't really seeing parallel processing of message. If you have a Database Reader where in the source transformer script you're going to read a 1000 rows, and you have 10 threads specified for the source connector, how is that handled?

I decided to re-architect it a bit: I duplicated the channel and turned it into a Channel Reader instead of a Database Reader, "Process Patient". Then, I modified the original Database Reader channel to get rid of the two destinations used to call the web service and update the database, replacing them with a single channel writer to this new channel. I made everything queued: the db reader channel source, the channel writer destination, the new Channel Reader; everything with the exception of the two destinations actually doing the work of calling the service and updating.

When I ran this, the log file went nuts and I started seeing parallel processing of the messages. SO much so, that I was also getting Donkey connector errors thrown.

I went back and tweaked it a bit: on the Database Reader, I dropped it down to 1 processing thread (since it's reading a 1000 messages at a time anyway), but I left source queue on, buffer 1000). Channel Writer destination to "Process Patient" is always queued, 10 threads, 1000 buffer size. Again, Channel Reader "Process Patients" is Source Queue On, buffer size 1000, 10 threads. Now it screams through the entire database, I see what looks like simultaneous calls to the web service going on.

So, what I have learned from this? Perhaps I'm overdoing it with making everything queued? Or is that the only way to avoid a bottleneck? If I create a channel with up to 10 processing threads and there's a destination that is queued which also has 10 processing threads specified, is it possible I could have up to 100 threads going for that destination theoretically?

---

I've increased the number of records the DB reader fetches to 2000 every poll. Everything still seems to be working. Unscientifically based on the log timestamps, it's taking right at 4 minutes to process 1798 patient, so my throughput is at 7.5 patients per second. The db looks correct after it finishes.

However, I *am* seeing the following error in the Mirth logs, although there are NO errors recorded in the console:

DEBUG 2016-06-03 15:14:03,844 [Database Reader Select JavaScript Task on Bulk Updater (b591aad7-0385-4a88-adfa-cd9dfceb8c9c) < pool-6-thread-83] db-connector: Bulk Updater polling database, requesting 2000 patient records...
DEBUG 2016-06-03 15:14:07,676 [Database Reader Select JavaScript Task on Bulk Updater (b591aad7-0385-4a88-adfa-cd9dfceb8c9c) < pool-6-thread-83] db-connector: Bulk Updater processing 1798 messages.
...
ERROR 2016-06-03 15:14:07,882 [Channel Dispatch Thread on Process Patient (083af5d7-d353-4869-ad07-c522f0a63b53) < Destination Queue Thread 5 on Bulk Updater (b591aad7-0385-4a88-adfa-cd9dfceb8c9c), Write to ProcessPatient (5)] com.mirth.connect.donkey.server.channel.Channel: Error processing message in channel Process Patient (083af5d7-d353-4869-ad07-c522f0a63b53).
com.mirth.connect.donkey.server.channel.ChannelExc eption: com.mirth.connect.donkey.server.data.DonkeyDaoExce ption: java.sql.SQLIntegrityConstraintViolationException: The statement was aborted because it would have caused a duplicate key value in a unique or primary key constraint or unique index identified by 'D_M2_PKEY' defined on 'D_M2'.
at com.mirth.connect.donkey.server.channel.Channel.di spatchRawMessage(Channel.java:1207)
...
DEBUG 2016-06-03 15:18:03,857 [Database Writer JavaScript Task on Process Patient




Should I be concerned about this?


***
Updated:
Just by caching the db connection I went to 33.5 patients per second processed, ie, web service call made for each and then updating the patient record. Initially I had the PreparedStatement cached in the globalChannelMap, but quickly realized that the threads were fighting over it, so don't do that. In general, I think you'd have to be pretty careful with any globalConnectionMap objects when you turn on multiple threads.

Still getting those errors in the log, however. Someone?

narupley 06-06-2016 07:56 AM

Yes, it is absolutely true that you need to be careful with shared memory (global map, global channel map) when using multiple processing threads.

Polling connectors like the Database Reader will still basically poll messages in serial, however you can still increase throughput by turning on the source queue. That way you'll have X source queue threads consuming messages, where X is your max processing threads.

ahart 06-06-2016 08:25 AM

Do I need to worry about this java.sql.SQLIntegrityConstraintViolationException I am seeing in the logs? As nearly as I can tell, it doesn't seem to be affecting the outcome.

narupley 06-06-2016 08:37 AM

What backend database are you using?

ahart 06-06-2016 10:31 AM

I presume you mean for the Mirth schema, not the database I'm updating. So, Derby in development.
We would be using Oracle in testing and production.

narupley 06-06-2016 11:26 AM

I see. Yeah you'll definitely see those type of errors with Derby, since it does not work well with multi-threading and message sequences: MIRTH-3871

You shouldn't see any of those message ID constraint violations in the supported production databases though, just Derby.

ahart 06-06-2016 11:49 AM

Channel A was the database reader. It is calling a database package that returns 2000 patients in a fetch. There are actually 1841 patients in the test database. I had it set to Source Queued, no response, 10 threads.

Channel A has a single destination, to write to Channel B. That Channel Writer destination was set to always queued. 10 threads.

Channel B is a Channel Reader. It is set to Source Queue On, 10 threads. It has two destination, one to call the web service, one to update the db, neither are queued.

Initially, my counts were getting off. I expect 1841 messages, I was seeing much more than that. I tried dropping the Channel A source queuing and number of threads to 1, and it seemed my counts were correct, yet I was still seeing those errors in the log. Since the results were correct in the target database, I wasn't even sure how it was working at all. I never saw those messages that were flagged with errors on the Write to channel B getting reprocessed.

Finally, I turned off all queuing in channel A, leaving all the queing on the channel B source connector. That seems to have gotten rid of all of the errors in the logs.

My throughput dropped a little, down to 29 patients per second, still more than acceptable.


All times are GMT -8. The time now is 02:38 AM.

Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Mirth Corporation