Search This Blog

Loading...

Tuesday, April 24, 2012

Optimization - Gradient (Steepest) Descent

Introduction

So you have an equation with 20 unknown variables.  You've taken 1000 data points and need to get an equation to represent this data.  Why would you need to do this?

Well it gives you a model of the data, which in turn can be used for simulations and estimations.

There are a lot of common uses for optimization.  One use (in my field) is circuit optimization.  When designing circuits, you sometimes need to fine tune the circuit for insertion loss or return loss.  There have been times I've gotten stuck and used optimization to improve these parameters.  In this article I'm going to give you a general understanding and try to provide a simple example.

General Algorithm for Gradient Descent Optimization

Gradient descent attempts to minimize a cost function by finding the gradient of the function and adjusting the variables in the opposite direction.  The Gradient Descent algorithm tends to guarantee a minimum (not always, and sometimes not the global minimum).

This algorithm can be slow since you must compute a partial derivative for each unknown.

Cost Function
The first thing that needs to be created is a cost function.  A cost function is function that we want to minimize.

For example, Let's say I have a circuit that is mismatched.  I want to match it across a set of frequencies.  Instead of doing it by hand I simply create an optimization algorithm.  I want my return loss to be at least 30 dB.  So the cost function would be (if Least-Squares).


where the Gamma_goal is 30 dB and Gamma_eqn is the output of our model.

The above is an example of the Least-Squares Cost Function.  We're minimizing the difference between these two values.  Gamma_goal is constant while Gamma_eqn is the model equation.  In this case it's the Reflection coefficient.

Although the cost function can be anything, I tend to use least-squares most often.  Here is a more generic form:


where beta is the parameters we're trying to find.  yi and xi are the data points.  In the case above, yi is the 30 dB number.  I would have used there for every point.

Model Equation
The next problem is to create a model.  The model must have a specific structure.  It can be any equation with parameters.  You can use this technique to fit lines to data as Excel does.  In that case we would use an equation like so:


Using our data we would be trying to find a0 and a1.  Now to find that data we must use the gradient descent algorithm.

Initial Value
Before we begin with the gradient descent, I must briefly speak about initial values.  Initial values for the parameters ai must be set prior to the algorithm.

These initial values do matter (and sometimes to a large degree).  This is largely because of the multiple valleys in the cost function that are possible.  We desire to find the global minimum of the function but the gradient descent only locates local minima.  If we're lucky it will find the global but this is not always the case.

So choose values that you believe to be close when using the gradient descent algorithm.  This will increase the chances of finding the global minimum.

For instance, if we're performing line fitting, and we see the trend of the line passing through the y-axis at 20 then we should set a0 to 20.

Judgement must be used in picking initial values.

Gradient Descent Algorithm

The algorithm is based on the partial derivative of the parameters ai.  Below is the equation used to determine the new values of ai:

The gamma is called the learning rate.  The new ai' is calculated using a current ai.  The learning rate should be chosen so that the jumps in the parameter is small enough not to miss the minimum and large enough so as not to take forever.

This also takes judgement but typically dynamic learning rates are used.  These are learning rates that change based on the increase/decrease of the gradient.

Now the fun part...the partial derivative.  In order for this algorithm to work, the partial derivative must be calculated.  On a PC, an estimate works just fine for most applications.  (It would defeat the purpose if we had to get an exact equation for it).


Epsilon is the change in the parameter to predict the limit.  The closer this number to zero, the better.  Of course PCs have limits but 1E-15 is usually small enough for most applications.

You must calculate all of these partial derivatives before changing the values.  Never use new values of the parameters before all the derivatives have been calculated.  This is a common mistake.

Once all of them are calculated, you can determine the new parameters.  Then you repeat with the new values to find a closer approximation.

After so long your values will start to converge to some value.  If this happens you have found a valid solution.  Congrats!

There are caveats to this algorithm you must be aware of.   Sometimes the learning rate is positive, sometimes negative.  It's important to perform checks along the way to make sure the cost function is being reduced.  If you don't do this it's possible to diverge with this algorithm as well.

Summary


The algorithm is fairly simple.  Below is what must happen:
  1. Develop a Model Equation for your data
  2. Develop the Cost Function
  3. Set Initial Values to the model
  4. Determine all partial derivatives for the parameters (no parameter update)
  5. Update parameters with the partial derivatives.
  6. Perform any necessary checks
  7. Repeat 4 - 6 until convergence
That should be about it.  When you actually go to build this algorithm up, you will find some various things that must be done to make it efficient.

Good Luck and ask questions if you need help!

Sunday, September 11, 2011

Beyond SPICE

So here I am, working on a new version of my previous SPICE software.  Of course to call something SPICE, it's usually a linear circuit simulator.  When you start adding nonlinear models with S-Parameter, Harmonic Balance, and Various Large-Signal simulations you start to move past SPICE into something a little more :).  So I have named this new version Beyond SPICE to show that it will be greater than a simple SPICE software.

So, in this I have started to create the basis of the software within the last 2 weeks.  I'm starting (for the most part) from scratch.  My last version was simply able to handle transient analysis for Resistors/Capacitors/Inductors/Current Source/Voltage Source.

This new version will handle this plus everything else.

Up to this point, I have the following elements in there first versions:
  • Complex Numbers and Parser:
    • The parser can parse directly to a complex number or can separate the equation into an expression tree.  I will explain this in another post.
    • Can handle all variations of complex number and functions including hyperbolic functions.
  • Matrices:
    • A number of matrix classes have been created.  Namely, a string matrix, complex constant matrix, and an expression tree matrix.
    • Complex Constant Matrix and Expression Tree Matrix can be inverted along with determinants.  The simple things
Although, this may not seem like much, the value in the time spent in these 2 major class libraries has a vast impact on the speed of the final product.  For instance, the Expression Tree allows for a complex statement to be parsed only once and evaluated many times (POEM - Parse Once Evaluate Many).  This means that parsing doesn't need to occur directly every iteration.  In fact, incorporating this into a matrix speeds up the time even more.

Secondly, having the invertability of the Expression Tree Matrix allows for inversion to only occur once and then evaluate the inverse matrix directly without having to perform the inverse at each time step.

Taking the time to develop these algorithms will show great improvement in performance down the road.

Hopefully I can keep you guys up to date more often.  Any questions, feel free to email me.

---------------------
Justin Coulston
justin.coulston@gmail.com

Monday, July 4, 2011

I'm not gone

I just wanted to make sure everyone knows that I'm not down and out yet.  I know it's been a while, but I plan on getting some new material into the blog.  A few of the new things I'll be talking about soon will be about some design techniques that can be employed for RF Engineering.  Some of these may seem simple and archaic, but can still be very useful when designing RF Chains.

Some of the topics will include:
  • Filter Design Techniques
  • Impedance Matching Techniques
  • Network Analysis Mathematics
  • Neural Networks in RF Engineering
  • Using software design for hardware problems
  • and more...
Hopefully, I can start introducing some of this material over the next year.  Look for it!

Sunday, May 2, 2010

Signal Processing: Modulation/Demodulations - The Basics

I want to give a basic understanding of Modulation and Demodulation, the process in which we transform signals into different waveforms by the changing of carrier properties (amplitude, frequency, phase, etc.).  I want to state that I am relatively new to modulation techniques and can only offer minimal insight into it's details.  Regardless...

Modulation, as above, is a process in which signals are transformed into different waveforms by changing basic properties.  For instance, say we have a bit stream we want to send over the air.  We can't just pulse 5V directly to an antenna to send the information.  We must first modulate the signal into a usable RF signal somehow...

There are a number of ways to do this:

  1. Amplitude Modulation: The process of using different amplitudes to represent different values.
  2. Frequency Modulation: The process where we change the frequency to represent different values.
  3. Phase-Shift Modulation: The process where we change the phase of a signal to represent different values.
Courtesy National Instruments


There are many more but these are the basics.  So we would have to come up with a conversion scheme for our digital stream.  You would think this is the easy part (and in cases is) but the issue that arises is bandwidth.

Many times we must keep signals within certain FCC bandwidths.  For example, I may have been allocated a certain bandwidth from 45.6MHz to 46.2MHz.  If I transmit outside this range, I'm in trouble.  So modulation will usually be based around this constraint first.  After transforming the digital signal we would then modulate the signal onto a 46MHz waveform to stay within bounds.  (Of course this is simplified)

Regardless, this is the basics of modulation from a systems point of view.  From the circuits point of view we use mixers, PLLs, and oscillators.

Demodulation is just the inverse of this process.  We will generally filter out the frequency we are looking for (so we don't get other garbage from the air) and reverse the process to retrieve the original signal.

That's it.  Not much more about modulation/demodulation I can tell.  It's crazy, I use it often where I work as an RF engineer, but I know so little...(I am new to the industry though).

Hope this is somewhat useful.

------------------------
Justin Coulston
justin.coulston@gmail.com

Saturday, May 1, 2010

Simulation: NPR Test Simulation

So during my time at work today (yes I worked on Saturday), I made an attempt to try and simulate an NPR test for an amplifier in Agilent's EESof/ADS.  Now there is a way to accomplish this using the DSP portion along with the Analog Envelope technique but I haven't found a way to do it directly with the RF/Analog system.  This is quite the disappointment...

Figure 1: NPR Test Stimulus (courtesy Agilent)


So I have been researching into solutions.  Only one has presented itself up to this point: Spectral Balance Simulations.  Unfortunately, ADS doesn't support this directly (at least not to my knowledge).  This is different than the typical Harmonic Balance Simulation most people are used to.  In fact the main difference is that the Spectral Balance simulation stays in frequency domain.  The Harmonic Balance simulation instead does a transient analysis then an FFT to complete the simulation (Of course it's more complex than this).


So how do I get around it?  I really have no idea right now.  The fact is, this is a known issue.  Why hasn't someone developed this into the more popular SPICE simulations?  There are random Matlab tools that can do the trick but not directly in SPICE packages.


Anyways, that's my rant for the day.  I'll go into more detail, hopefully, at a later date.


---------------------------------
Justin Coulston
justin.coulston@gmail.com

Friday, April 30, 2010

RF: Noise to Power Ratio (NPR) Test

Introduction
NPR, or Noise to Power Ratio, is a figure of merit describing an amplifier's linearity.  When I say linearity, I am referring to the amplifier's ability to keep it's gain flat versus output power.  Of course it's deeper than this but this is a good way to understand it.

Now many people know of different types of linearity figures of merit, OIP3 for instance.  Unfortunately, OIP3 is only good for certain instances, single carriers or very few carriers.  But what about OFDM or QAM modulation techniques that have closely spaced signals?

This is where an NPR test is useful.  It takes into account a lot of different carriers during the test and tests how linear the amplifier is.  I'm not going into great detail of the mathematics but I will indicate the basics of how it works

The Basics
The most common way to perform an NPR test is to use 5 main devices in the lab:

  1. AWGN source (Noise Source): This is used to simulate the many frequency tones.
  2. Bandpass Filter: A filter used to bandlimit the noise into a usable bandwidth. Generally, this is the waveform's bandwidth.
  3. Notch Filter: A filter, or method, of taking out a small section of the noise, creating a "notch" in the middle of the waveform.
  4. AUT: This is the amplifier that is being tested, generally a power amplifier.
  5. Spectrum Analyzer: Used to take the measurements. (Could be a Network Analyzer)
So in order to perform the test, you need to hook up the source to the bandpass filter then the notch filter.  Followed by this is the AUT and finally the Spectrum Analyzer.  Generally the notch will be the bandwidth of the signal separation.

Ultimately, the point of this setup is to see how much the notch is filled after the sample noise waveform passes through the amplifier.  This idea is based on intermod theory.  When many tones are very close together they will "intermodulate" with each other and form harmonics at other close frequencies.  This is how the notch "fills up," by the intermodulation harmonics from other tones.  Based on this we can find how linear the amplifier is relative to different noise power levels.

Conclusion
So there is a lot here that wasn't said but I hope you have a basic idea on what the purpose and why NPR works.  An NPR test is basically using intermod theory on a large number of tones to see how much they modulate with each other.  After that, it's simple math using dBc/Hz reference powers of Spectral Density.  If you desire more information or examples, I'll be more than happy to help.  Just email me or comment. Later!

-------------------------------
Justin Coulston
justin.coulston@gmail.com

Thursday, April 29, 2010

RF: Broadband Matching

Introduction
Today at work, I came to the conclusion that Broadband matching is a fine art.  It takes years of experience to understand and get it right.  Even the most experienced RF Engineers still have issues developing a matching network for amplifiers and power transistors.

I'm going to give a brief overview of Broadband matching and how it applies to the RF engineer.

The Basics
A matching network is simply any network that can transform impedances from it's input to output.  Generally, they consist of either LC networks or microstrip equivalents.  They come in the form of filter type matches.  I have mostly used strings of networks with the high pass and low pass structure.

There are many ways to match networks and many structures you can use.  At low frequencies (under 1 GHz) and at narrow bandwidths (under 100 MHz) it's generally easier to match using the Smith Chart.  At higher frequencies and higher bandwidths, I recommend a more trial and error approach.  This is what broadband matching entails (higher bandwidths).  There is no single method to broadband matching, so I'll show you my method and then only with amplifier/transistors.

My Method
So let's say we have an RF amplifier that is denoted like so:


Every amp has an inherent input impedance and output impedance.  This is modeled by something like so:


Now generally, input impedances are meant to be extremely high (Megaohms in Op-Amps) but in RF amps they can be very small (sometimes 2 - 3 ohms).  The output can be almost anything, but generally smaller.

Now something to notice, is the inherent resistances.  Most of the time they are not strictly resistive and will be reactive as well.  I noticed today that the amp I was using was in fact inductive at high frequencies.  So I had to match accordingly.  



Since Rin = Lin, I began my matching with a series capacitor like so:


So you see, I used the internal inductor as my first component, then use the series capacitor to complete this first part of the match (High Pass Filter).

To shorten this post up a little bit, I will say that you will then put a shunt capacitor as the next component, then a series inductor, etc, etc.  Repeat those until you have a match.  Do the same for the output.

Now the big question you'll be asking is "what about the values?"  Well this is where the art comes in.  It's not a very simple topic to go through.  This is completely dependent on what frequency return loss goals you have and how many networks you have. I generally use an optimizer in Agilent's ADS Simulation Software (EESof).

But I will give you a general overview of what "type" of values you'll need for certain frequency ranges:

HIGH FREQUENCY (500MHz-2GHz)
  • Shunt C: Very low values (0.1pF - 2pF)
  • Shunt L: Medium values (50nH - 100nH)
  • Series C: Medium RF values (20pF - 50pF) [These are more dependent on other factors]
  • Series L: Very low values (1.6nH - 15nH)
LOW FREQUENCY ( < 500MHz)
  • Shunt C: Medium values (100pF - 1uF)
  • Shunt L: High Values (1uH - 100uH) [This is a guesstimate]
  • Series C: High RF values (1000pF - 10uF)
  • Series L: Medium values (100nH - 10uH)
I hope this post is somewhat useful.  I know it's vague on details but I want to give you an idea on broadband matching.  I can't really write a whitepaper on it for a post.  If you have specific questions email/comment and ask.  Later

--------------------
Justin Coulston
justin.coulston@gmail.com