Questions on In Line Devices - E1.20 RDM (Remote Device Management) Protocol Forums

Andrew_Parlane · August 27th, 2020

Hi all, we're implementing a new transparent in-line device and I have a couple of questions on my interpretation of the spec.

1) 4.2.1.1 Port Turnaround during Discovery. What counts as a discovery request here? Is it anything with the DISCOVERY command class? Or is it only DISC_UNIQUE_BRANCH?

DISC_UNIQUE_BRANCH is the only message that should have collisions, and as such this rule seems like it should only apply to that, but I saw some other posts here suggesting that it was the entire DISCOVERY command class.

2) When should an in-line device return to forwards data flow during non discovery transactions, when errors are detected with the responder's response?

Let's say that we have a responder that watchdogs at some point during it's transmit. The in-line device has to eventually time out and return to forward data flow in order to catch the controller's next request. I think the key to this is how should a controller treat a partial response? Table 3-2 Controller Packet Spacing Times, has:

line 4: responder response -> controller any packet, min 176us
line 5: controller request -> controller any packet, min 3ms

What should the controller consider the start of a reply? A falling edge on the line, regardless of whether it's for a valid break or not?

3) 4.2.1 Port Turnaround states:

Quote:

After receiving an RDM request packet, the first port that is pulled to a low state for the start of a BREAK becomes the active port. Note that this port may be the responder port, in which case the in-line device shall return to forward data flow. Otherwise, data from the active port shall drive the responder port and may drive all other command ports on the in-line device.

In the case where a command port receives the start of a response should we disable reception from all other command ports? A well behaving system should not have two responders respond in this case since it's not a discovery message, but there could be badly behaving responders on a system causing this collision. Should the in-line device block that or allow it to reach the controller?

Edit:

4) If the responder were to send a break and then never send it's first data slot. The controller should timeout the MAB after 88us according to table 3-1. When can it start it's next frame? Table 3-2 line 4 would suggest 176 us after the last rising edge of the responder (end of break), since it's already timed out the MAB after 88us, that would mean the controller could start to transmit in 176us - 88us?

Same for a timeout in the inter-slot timing. The controller would time out an inter-slot after 2.1ms, so when could the controller start it's next reply? It should be 176us after the last stop bit of the last responder slot, but it's already been 2.1ms since then, so could it start immediately?

In these two cases when should the in-line device turn the bus around?

Thanks,
Andrew

ericthegeek · August 27th, 2020

I've posted in the past about Splitters and Inline devices. Search for forums for my previous posts and you should be able to find them.

Having built a fully protocol-aware transparent inline device, my advice is: Don't try to build a fully protocol-aware transparent inline device. The error handling will drive you nuts!

Quote:

Originally Posted by Andrew_Parlane

1) 4.2.1.1 Port Turnaround during Discovery. What counts as a discovery request here? Is it anything with the DISCOVERY command class? Or is it only DISC_UNIQUE_BRANCH?

This only applies to the Discover Unique Branch Command.

Quote:

Originally Posted by Andrew_Parlane

2) When should an in-line device return to forwards data flow during non discovery transactions, when errors are detected with the responder's response?

Error handling is the hardest part of building a protocol-aware inline device. The standard is silent on the matter, and much of the time there is no good solution when you detect an error.

The "partial response" problem you mention here is one problem. The other is what to do when there's a checksum error in a request or response. It may look like a checksum error to you, but if the error is due to noise or analog effects on the 485 line it could still look perfectly fine to other devices.

The only advice I can give is: When in doubt, cut off the downstream branch(es) of your inline device, switch back the upstream port that has the controller attached to it, and wait for the controller to come to its senses. This may cause additional corruption to that specific request/response, but should leave your system ready for the next valid request.

Quote:

Originally Posted by Andrew_Parlane

What should the controller consider the start of a reply? A falling edge on the line, regardless of whether it's for a valid break or not?

You can't just look for a falling edge. You'll need a glitch filter to deal with noise and 485 line driver enable/disable transients.

The shortest valid low you should ever see on the 485 line is 4us. It's common to see glitches that are a few hundred nanoseconds long. Most 16x oversampling UARTs won't see them, but a timer or interrupt based edge detector will trigger on them.

If you see any events lows that are <1us you should filter them out and ignore them.

Quote:

Originally Posted by Andrew_Parlane

In the case where a command port receives the start of a response should we disable reception from all other command ports? A well behaving system should not have two responders respond in this case since it's not a discovery message, but there could be badly behaving responders on a system causing this collision. Should the in-line device block that or allow it to reach the controller?

There's no right answer to this question. If you have a good glitch filter, you can usually disable the other ports. However, there are some poorly behaved responders that will put a 1 to 2us glitch on the line even even when they are not responding. If you treat this as the start of a response, then you risk cutting off the port that has the real responder.

But, if you leave the other ports enabled and AND all of them together you risk allowing a glitch through from another branch that will corrupt an otherwise valid response.

Quote:

Originally Posted by Andrew_Parlane

4) If the responder were to send a break and then never send it's first data slot. The controller should timeout the MAB after 88us according to table 3-1. When can it start it's next frame? Table 3-2 line 4 would suggest 176 us after the last rising edge of the responder (end of break), since it's already timed out the MAB after 88us, that would mean the controller could start to transmit in 176us - 88us?

If you strictly follow the standard, then the MAB should never be more than 88us. But, the standard allows responders to have a 2ms pause between data bytes in a response. This 2ms was included to allow a high priority interrupt to do something like update a motion servo. However, some responders have a 1 to 2 ms interrupt that can trigger at any time during the response, including during a Break+MAB. Thus, it's best not to be too strict about enforcing the response's Break+MAB.

And of course: Test, test, test. There are all kinds of corner cases. When I designed mine I build an "RDM Devil" that did nothing more than inject 1 to 4us pulses at very specific times to test my error handling. Injecting them during the 176us time period between packets was especially effective. This is the time period when the 485 line is undriven, and should be held high by the pull-apart resistors, but it often sees noise.

Andrew_Parlane · August 28th, 2020

Thanks for the reply. You raise some interesting points. Glitches are something I hadn't thought much about, but I can see how they could be an issue. I think I can modify my design to deal with them correctly without too much effort.

Quote:

Having built a fully protocol-aware transparent inline device, my advice is: Don't try to build a fully protocol-aware transparent inline device. The error handling will drive you nuts!

Unfortunately any transparent in-line device must be protocol aware right? You must at least check for the RDM start code and the length field, so that you know when to turn the bus around, you must also be able to tell if it's a discovery packet or not so you can follow the rules for discovery vs non-discovery turnarounds.

Quote:

The "partial response" problem you mention here is one problem. The other is what to do when there's a checksum error in a request or response. It may look like a checksum error to you, but if the error is due to noise or analog effects on the 485 line it could still look perfectly fine to other devices.

A CRC error in a response shouldn't matter to you as an in-line device. By the time you've detected it, the data has already passed through and there's nothing you could do even if you wanted to. So I don't think there's any point even checking that, unless you want to do so for statistic generation purposes.

A CRC error in a request is interesting, although only in the case where the in-line device sees it as an error but the actual responders don't. Since theoretically you could interpret it as non-discovery request, whereas it's actually a discovery request, however I feel that's pretty unlikely, since you'd need to just happen to corrupt the 3 bytes of CC and PID.

I would also argue that the if the in-line device detects errors frequently enough to be annoying, then that would indicate a problem with your setup, either it's too prone to noise or something else on the bus is not compliant. If the in-line device were replaced by a normal responder in the same location it too would detect CRC errors at that point.

So I don't think there's too much of an issue with invalid CRCs, I would propose that the in-line device shouldn't even check the CRC and just turn the bus around as normal and let the responder's deal with that. If they respond, then you're ready for them, if not, you revert back to forward data flow when you detect the start of a new break on the responder port.

Quote:

You can't just look for a falling edge. You'll need a glitch filter to deal with noise and 485 line driver enable/disable transients.

The shortest valid low you should ever see on the 485 line is 4us. It's common to see glitches that are a few hundred nanoseconds long. Most 16x oversampling UARTs won't see them, but a timer or interrupt based edge detector will trigger on them.

If you see any events lows that are <1us you should filter them out and ignore them.

Agreed, I should be able to filter anything shorter than ~900ns without issues, and I think I have a software scheme to deal with anything up to about 2us, if there are glitches longer than that, then there's not much I can do.

Quote:

There's no right answer to this question. If you have a good glitch filter, you can usually disable the other ports. However, there are some poorly behaved responders that will put a 1 to 2us glitch on the line even even when they are not responding. If you treat this as the start of a response, then you risk cutting off the port that has the real responder.

But, if you leave the other ports enabled and AND all of them together you risk allowing a glitch through from another branch that will corrupt an otherwise valid response.

Quote:

However, some responders have a 1 to 2 ms interrupt that can trigger at any time during the response, including during a Break+MAB. Thus, it's best not to be too strict about enforcing the response's Break+MAB.

OK, that's good to know.

Thanks again,
Andrew

How often have you seen these 1us - 2us glitches? Could you comment on which fixtures do this? They may be useful to test with. Is it common enough a problem that this needs to be handled correctly? Or is it sufficiently rare that the problem can be ignored and listed as a know issue?

ericthegeek · August 28th, 2020

Quote:

Originally Posted by Andrew_Parlane

Unfortunately any transparent in-line device must be protocol aware right?

Not true at all! I know of 3 splitters out there that have no protocol awareness whatsoever, and I'm sure there are more. Two of those non-aware models are my favorite splitters. They are the "Keep It Simple" principle incarnate, and they tend to work well even in difficult installations. (One of those two favorites is one I built, so of course I think it's the best)

I've built both a fully protocol aware splitter, and a simple splitter. The simple one has nothing more than falling edge detector (with glitch filter of course) and a few 8 to 20us timers.

If you didn't find it already, make sure you've read this thread:
https://www.rdmprotocol.org/forums/s...ead.php?t=1248
In my responses I touch on the architecture for a "simple" splitter.

Quote:

Originally Posted by Andrew_Parlane

I would also argue that the if the in-line device detects errors frequently enough to be annoying, then that would indicate a problem with your setup, either it's too prone to noise or something else on the bus is not compliant. If the in-line device were replaced by a normal responder in the same location it too would detect CRC errors at that point.

Spherical Cows, Frictionless Surfaces, and Error-Free communication lines. All three are nice theoretical abstractions that don't exist in the real word.

Another problem is that inline devices often expose problems with other RDM devices. Most controllers only listen when they are expecting a response, so if a responder yacks when it shouldn't the controller won't hear it and it doesn't matter. Same with responders, if they get a bunch of garbage, they can just ignore it and wait for a new break. But inline devices have to watch everything that's happening, and make decisions on that activity.

Quote:

Originally Posted by Andrew_Parlane

So I don't think there's too much of an issue with invalid CRCs, I would propose that the in-line device shouldn't even check the CRC and just turn the bus around as normal and let the responder's deal with that. If they respond, then you're ready for them, if not, you revert back to forward data flow when you detect the start of a new break on the responder port.

To know when to turn the line around, you need to know when the request is finished. And to know when the request is finished you need to parse the length field. But to know if the length field is valid, you have to validate the checksum, otherwise you risk getting a garbage length field

Consider this scenario: Your UART mis-frames on some noise following the request's break, so you read the length shifted by 3 bits. The responder doesn't mis-frame and sees a valid 30 byte request, so it starts responding with a long response. In the meantime, you're waiting for 240 bytes from the responder, but you never get them, so you timeout after a few ms and return to forward flow. The controller also times out because you're in forward flow and it can't hear the responder. So the controllers starts sending more data. You now have a collision on the downstream segment, and probably some flashing lights. It's even worse if the controller sends another request immediately. You see that new request, and turn around based on the activity from the responder that is still sending its long response to the previous request.

If you're going to use any data from the packet, you need to validate the checksum. And, because the RDM additive checksum is really weak (it's not a CRC), I recommend validating some other structural elements of the packet, such as the Length field matching the PDL+24. If the structural elements are wrong, treat it like a checksum error.

Quote:

Originally Posted by Andrew_Parlane

How often have you seen these 1us - 2us glitches? Could you comment on which fixtures do this? They may be useful to test with. Is it common enough a problem that this needs to be handled correctly? Or is it sufficiently rare that the problem can be ignored and listed as a know issue?

2-way radios and noisy VFD motor drivers are your opponent. It's more common to see glitches in the 100 to 500ns range than 1 to 2 us.

Andrew_Parlane · September 8th, 2020

Sorry for the slow reply, I didn't get a chance to look at this last week, and wanted to make sure I understood everything properly before responding.

So by an "simple" splitter what you mean is:

Starts in forward data flow mode, ignoring everything from the controller.
When it sees a falling edge from any command port it swaps to backward data flow.
When it next sees a falling edge from the responder port it switches back to forward data flow, and repeats.

Then the edge detection should be filtered for glitches. For non-discovery responses this is fine because that (let's say 2us) filter eats into the break time, which is no issue because you can shorten the break by up to 22us. However in discovery responses that 2us filter eats into the start bit of the first preamble of the DUB response, which is potentially a problem. You're permitted to eat the first byte of that, but since we are not protocol aware we don't know what we are receiving. Taking 2us out of a start bit may well cause issues with the controller, and cause it to think there was a collision, when in fact there was none. One fix would be to have a few us delay on the bus, so that when you turn the bus around you don't drop anything, unfortunately our hardware doesn't have support for that.

The other issue I see with this approach is that when in forward data flow with the controller transmitting, that data gets forwarded out of the command ports, so if you are also monitoring the command ports for Rx, you'll see the data you just forwarded. So you have to determine what's a falling edge from an actual responder and not forwarded data.

Quote:

Another problem is that inline devices often expose problems with other RDM devices. Most controllers only listen when they are expecting a response, so if a responder yacks when it shouldn't the controller won't hear it and it doesn't matter. Same with responders, if they get a bunch of garbage, they can just ignore it and wait for a new break. But inline devices have to watch everything that's happening, and make decisions on that activity.

The point you made in the linked thread about the splitter manufacturer being blamed for badly behaved responders is a very good point, and I liked your comment about disabling that port and informing the controller about why.

Quote:

Consider this scenario: Your UART mis-frames on some noise following the request's break, so you read the length shifted by 3 bits. The responder doesn't mis-frame and sees a valid 30 byte request, so it starts responding with a long response. In the meantime, you're waiting for 240 bytes from the responder, but you never get them, so you timeout after a few ms and return to forward flow. The controller also times out because you're in forward flow and it can't hear the responder. So the controllers starts sending more data. You now have a collision on the downstream segment, and probably some flashing lights. It's even worse if the controller sends another request immediately. You see that new request, and turn around based on the activity from the responder that is still sending its long response to the previous request.

What's the alternative to this?

In your scenario, the splitter is in forward data flow waiting for a bunch of data from the controller that's not going to arrive. The responder sends a reply, but that gets ignored because we're in forwards data flow. At some point the controller times out and sends another break and the splitter resets it's state machine and starts the new packet. The responder may or may not be transmitting at this point, which could cause collisions on that command port.

With CRC checking in the splitter we instead have: If the splitter does exactly the same as before until it gets to the CRC, it checks that and finds it's wrong so does nothing and doesn't turn around, waiting for the next frame from the controller. The responder doesn't see the CRC error and starts responding, the data is ignored because we're in forwards data flow. The controller times out, sends another frame, collisions may occur on the command port.

The behaviour is exactly the same whether or not you check the CRC in that case.

The opposite case is similar, if we miss read the length as shorter than actual. In this case if the splitter doesn't check the CRC then we turn around too early and drop part of the request. The responder doesn't see a valid packet and drops it, at some point the splitter times out and goes back to forward data flow (hopefully before the controller sends another frame).

The other potential issues is if the splitter sees something as a DUB when actually it's non-discovery or vice versa.

If the splitter thinks it's a DUB and therefore follows the discovery turn around rules from section 4.2.1.1, but actually it's not a discovery request. Then we run the risk of cutting off half of the responder's reply or in the case of no reply, potentially switching back to forward data flow in the middle of the controllers next request.

That scenario seems unlikely because you'd have to error on the PID and the CC, but it's easy enough to check that CRC and so I guess we may as well.

You've given me plenty to think about,
Thanks

ericthegeek · September 13th, 2020

Quote:

Originally Posted by Andrew_Parlane

So by an "simple" splitter what you mean is:
Starts in forward data flow mode, ignoring everything from the controller.
When it sees a falling edge from any command port it swaps to backward data flow.
When it next sees a falling edge from the responder port it switches back to forward data flow, and repeats.

A simple splitter sits with all ports in tri-state until it sees a falling edge on one port. It then starts driving the data from that port to the other ports until the source port goes idle for about 100us. Then it goes back to tristate and waits for another edge (which may come from the same port, or a different port). This technique works best if you have a delay line.

Standalone 485 splitters that do this have existed for decades, and work with many industrial protocols.

Quote:

Originally Posted by Andrew_Parlane

Then the edge detection should be filtered for glitches. For non-discovery responses this is fine because that (let's say 2us) filter eats into the break time, which is no issue because you can shorten the break by up to 22us. However in discovery responses that 2us filter eats into the start bit of the first preamble of the DUB response, which is potentially a problem. You're permitted to eat the first byte of that, but since we are not protocol aware we don't know what we are receiving.

As you say, the two rules are: You can shorten the break by 22us. And you can remove 0 or 1 byte from the discovery response.

But: Look at the DUB response preamble waveform: 0xFE is 00111111111 binary on the wire (if you include the start bit and 2 stop bits). With a scope, that looks like an 8 us low followed by a 36 us high.

If you shorten the first byte of the DUB response by 2us it will corrupt the preamble data byte. But if you shorten it by 8us to 36us it will effectively remove the entire first byte (you'll eliminate the first two zero bits, leaving behind some extra one bit that will look like an idle to the receiver).

Combine these two and you can shorten the beginning of a response by 8 to 22us without needing to know whether it's a DUB response (without a break) or a normal response (with a break).

The standard has all of the elements necessary to allow both protocol-aware, and non protocol-aware, inline devices. There are many different ways to put the pieces together and get a functional design. It does take a while to think it through (my first splitter made all of these mistakes and more...)

Andrew_Parlane · September 16th, 2020

OK, great, thanks for the info. That gives me plenty to be working on.

Thanks again,
Andrew

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Hardware design questions	ELMVideo	RDM Physical Layer/Hardware Discussion	14	July 21st, 2013 02:52 PM
Detailed questions on STATUS_MESSAGES	eldoMS	RDM General Implementation Discussion	2	September 2nd, 2011 02:14 PM
Cable shield connection questions	berntd	RDM Physical Layer/Hardware Discussion	0	May 25th, 2008 11:22 PM
Multi-devices or sub-devices	Fokko	RDM Interpretation Questions	4	December 1st, 2006 11:10 AM

September 16th, 2020	#7
Andrew_Parlane Junior Member Join Date: Aug 2020 Posts: 4	OK, great, thanks for the info. That gives me plenty to be working on. Thanks again, Andrew